# FMRI Playground
## Which fMRI analysis method should you use? (in progress)
https://docs.google.com/document/d/1lAu_k9oaUh_tlgM5sx4X3HDcnI2Qv-gK6y_JsJMJ4jo/edit
## List of all methods pages (in order top to bottom)
(can see in-progress jupyter notebooks here: https://nbviewer.jupyter.org/github/PaulScotti/fmriplayground/tree/main/methods/)
% = my idea of how complete it is
1. Univariate analysis (95%)
2. Searchlight (60%)
3. Correlation-based multivariate pattern analysis (MVPA) (5%)
4. Linear multivariate pattern classification (MVPC) (25%)
5. Representational similarity analysis (RSA) (5%)
6. Phase-encoded mapping / retinotopic mapping (30%)
7. Population receptive field mapping (pRF) (5%)
8. Dimensionality reduction (60%)
9. Encoding and decoding models (95%)
10. Neural networks (30%)
11. Functional connectivity (15%)
12. Graph theory / network neuroscience (70%)
13. Causal modeling / effective connectivity (70%)
14. Intersubject models (ISC SRM IS-RSA fingerprinting)
# 1. Univariate Analysis
Univariate analysis is a common and basic approach to testing whether some experimental condition induces a reliable difference in activation within a single voxel or a single brain region. Univariate analysis typically involves comparing the mean activation across voxels in one condition against the mean activation across voxels in another condition. That is, for each participant, obtain the average brain activations for at least two conditions within a brain region (raw BOLD signal or beta weights can be used as the estimate of brain activity, where beta weights are obtained from general linear model estimation*) and conduct a t-test across participants on the difference in activations between conditions to see whether the brain region is consistently more or less active across people in response to certain conditions.
* Regarding general linear models, general refers to the different types of analyses that can be used (e.g., correlation, one-sample t-test, ANOVA, etc.) and linear refers to how the model assumes a linear relationship between parameters (often preferred because nonlinear models are prone to overfitting, where a model appears to fit the data well but is not generalizable to new data). *Generally* speaking (get it?), the general linear model is one common way to obtain estimates of brain activation (in arbitrary units) for every voxel after accounting for things like the hemodynamic response function (time-course of blood flow) and body motion in the scanner.
## Repetition Suppression
Repetition suppression, or fMRI adaptation, is a kind of univariate analysis approach that utilizes how neural activity adapts to the repeated presentation of the same stimulus. For example, if you present an image of Jennifer Aniston (or any face), the fusiform face area (FFA) will show increased activation. If you then immediately present the same image again, activation will not be as pronounced as the first presentation because the FFA specializes in the processing of faces and would adapt to the repeated face image. Alternatively, if you showed a picture of Barack Obama, activation may be as high as the first presentation of Jennifer Aniston because the FFA had adapted to Jennifer Aniston and not Barack Obama. This difference can then be contrasted with a region of the brain that is not selective for faces, such as early visual cortex, where such adaptation would likely not be observed. Repetition suppression is a kind of univariate analysis approach to gauge the selectivity of a brain region by testing for reduced activation to repeated presentations of a stimulus.
## Relevant information
### Univariate analysis vs. multivariate pattern analysis
Whereas univariate analysis typically compares mean activation between conditions, multivariate pattern analysis (MVPA) is usually more sensitive than univariate analysis because MVPA uses the distributed pattern of activations in testing for differences between conditions (MVPA does not average across voxels).
[Davis et al. (2014): What Do Differences Between Multi-voxel and Univariate Analysis Mean? How Subject-, Voxel-, and Trial-level Variance Impact fMRI Analysis](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4115449/)
### Beginner friendly handbook on fMRI analysis
[Poldrack, Mumford, & Nichols (2011): Handbook of functional MRI data analysis. Cambridge University Press](https://www.cambridge.org/core/books/handbook-of-functional-mri-data-analysis/8EDF966C65811FCCC306F7C916228529)
### Example of univariate analysis
[Kanwisher, McDermott, & Chun (1997): The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception](https://www.jneurosci.org/content/jneuro/17/11/4302.full.pdf)
### Examples of repetition suppression
[Henson, R. N. (2003): Neuroimaging studies of priming.](https://pubmed.ncbi.nlm.nih.gov/12927334/)
[Dilks, Julian, Kubilius, Spelke, & Kanwisher (2011): Mirror-image sensitivity and invariance in object and scene processing pathways](https://www.jneurosci.org/content/31/31/11305)
# 2. Searchlight
If the goal of the researcher is to explore where the brain is most responsive to a task, it may be useful to look at a statistical map (t-map, f-map) or to employ searchlight mapping.
A statistical map involves repeating the analysis for every voxel separately and plotting the corresponding statistical value (e.g., t or F value) for each voxel onto the brain.
A searchlight involves repeated analyses (e.g., repeated univariate analysis) for a number of spherical regions of interest. Imagine dividing up your entire brain into many spherical regions and separately doing univariate analysis on each one (e.g., can look for the spherical region with highest t-value).
## Relevant information
# 3. Correlation-based multivariate pattern analysis (MVPA)
## Relevant information
### Book chapter on different types of MVPA
[Lewis-Peacock & Norman (2013): Multi-Voxel Pattern Analysis of fMRI Data](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.475.4280&rep=rep1&type=pdf)
# 4. Linear multivariate pattern classification (MVPC)
## Relevant information
### MVPC vs. inverted encoding models
MVPC considers different stimuli as discrete classes and can only predict classes that were used in the training of the model. IEM considers stimuli as varying across a continuous dimension, utilizes hypothesized neural tuning functions to build the model, and can predict classes not used to train the model.
### MVPC vs. neural networks
Linear MVPC requires less data compared to neural networks and will identify a more interpretable decision boundary for making predictions. Compared to MVPC, neural networks require much more data, greater computational power, and are less interpretable. As long as there is enough data, neural networks will generally perform better than MVPC due to universal approximation theorem.
More info on data-driven neoroimaging
[Brunton & Beyeler (2019): Data-driven models in human neuroscience and neuroengineering](https://www.sciencedirect.com/science/article/pii/S0959438818302502)
### Cross-subject decoding
Just decoding across instead of within
# 5. Representational similarity analysis (RSA)
## Relevant information
### Overview of RSA
[Kriegeskorte, Mur, & Bandettini (2008): Representational similarity analysis - connecting the branches of systems neuroscience](https://www.frontiersin.org/articles/10.3389/neuro.06.004.2008/full)
### What is representational similarity: Video lecture
[Dr. Rebecca Saxe's fMRI Bootcamp: Part 7](https://www.youtube.com/watch?v=bQhg8H6iS_s)
# 6. Phase-encoded mapping / Traveling wave / Retinotopic mapping
Phase-encoded mapping, aka retinotopic mapping or traveling wave, is a method used to retinotopically map the visual cortex.

Expanding rings (for measuring eccentricity)

Rotating wedges (for measuring polar angle)

Dougherty et al. (2003) shows how eccentricity and polar angle can be mapped to early visual areas.
## Relevant information
### Background/overview of phase-encoded mapping
[Engel (2012): The development and use of phase-encoded functional MRI designs.](https://pubmed.ncbi.nlm.nih.gov/21985909/)
### Paper presenting visual field representations of early visual areas
[Dougherty, Koch, Brewer, Fischer, Modersitzki, & Wandell, (2003): Visual field representations and locations of visual areas V1/2/3 in human visual cortex.](https://pubmed.ncbi.nlm.nih.gov/14640882/)
### Retinotopic Maps: Video lecture
[Lecture from Dr. Nancy Kanwisher on retinotopic maps](https://www.youtube.com/watch?v=MhFJIgeY-ZY)
### Fourier transform overview: Video lecture
* [Lecture from Dr. Geoffrey Aguirre on Fourier transforms](https://www.youtube.com/watch?v=J1XYcIj86TI&t=4579s)
# 7. Population receptive field mapping (pRF)
## Relevant information
### Paper that introduced pRF mapping
[Dumoulin & Wandell (2008): Population receptive field estimates in human visual cortex](https://pubmed.ncbi.nlm.nih.gov/17977024/)
# 8. Dimensionality reduction (Jiageng_doing this part)
In fMRI analyses, our dataset often has very high number of features (voxels) compared with small number of trials/participants. For example, we may want to know whether we could decode the attended location in a fast event-related design by using data from occipital and parietal regions. In this case, the total number of features (voxel) could be more than 10000 but we may only collect about 200 trials. Therefore, the number of features is way higher than the number of trials. You may think this is not a problem or even a good thing because maybe machine learning algorithm could “magically” pick up the most “informative” features. However, that is not true in reality: 1) most of the features may just be noise, and adding noise won’t help with the model. 2) those redundant features may cause overfitting problem: the model may learn to fit the noise; when the model is applied to the test data, the performance may drop significantly. This is often called the “curse-of-dimensionality” and “small-n-large-p” effects. Therefore, we often need to reduce the number of features by using dimensionality reduction techniques. In general, dimensionality reduction is more often used for preprocessing, compression, denoising, visualization, etc.
## Commonly used dimensionality reduction methods
### Averaging
You may be using dimensionality reduction already and not even know it! Dimensionality reduction is simply reducing the complexity of your data into fewer dimensions, so if you have 100 voxels and average across them all into 1 data point, you are going from 100 dimensions to 1 dimension. Averaging is technically a dimensionality reduction technique.
### Principal Components Analysis (PCA)
One of the most commonly used dimensionality reduction technique is PCA. The basic idea of PCA is to reduce data (e.g., 100 voxels) into several components (e.g., 20 components) that explain most variability of the data. These components are uncorrelated with each other. We can rank those features based on their explained variabilities. Intuitively, the most “informative” component is the component that could explain the most variability compared with other components. PCA is purely data-driven: it doesn’t care about the classes the data represents; the resulting components may not be easily interpretable.
### Independent Components Analysis (ICA)
Similar to PCA, but the components must be statistically independent, not just uncorrelated. You can think of it as trying to "unmix" your data. While PCA tries to reduce your data into less variables while preserving the richness of your data, ICA tries to deduce the separate signals that mixed together to produce your observed data.
### Linear Discriminant Analysis (LDA)
In contrast to PCA, which is considered as unsupervised, LDA is a supervised technique that can be used for data reduction. For LDA, you need to provide not only the data but also the classes associated with your separate data points. LDA then tries to create a reduced set of components that best separates the data based on the provided classes.
### Factor analysis
The factor analysis is another data reduction technique that is typically used in survey data. Each survey item is considered as the measured indicator for one or more unobserved latent factors. The survey on its whole measures several related latent factors or one latent factor. The latent factors represent the true construct that cannot be directly measured, but they can manifest in the observed item scores. For example, the underlying construct of personality cannot be directly measured, but the underlying personality can affect how each item on the personality scale is rated. Therefore, factor analysis can be used to reduce very long personality test responses into a small number of personality traits.
### Hidden Markov Model
## Relevant information
### A review of feature reduction techniques in neuroimaging
[Mwangi, Tian, & Soares (2014): A review of feature reduction techniques in neuroimaging](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4040248/)
### PCA explained visually
[Powell & Lehe (2014): Principal Component Analysis Explained Visually](https://setosa.io/ev/principal-component-analysis/)
### FMRI-based PCA tutorial using Python (BrainIAK)
[BrainIAK: Dimensionality Reduction](https://brainiak.org/tutorials/04-dimensionality-reduction/)
### Dimensionality reduction: Video lecture
* [Neuromatch 2020: Dimensionality Reduction Intro](https://www.youtube.com/watch?v=zeBFyRaoVnQ)
### Nonlinear dimensionality reduction (manifold learning)
[Nonlinear dimensionality reduction in scikit-learn](https://scikit-learn.org/stable/modules/manifold.html)
# 9. Encoding and decoding models (Paul plans to do this)
Encoding models refer to any model that inputs stimulus or task information and outputs predicted brain activations. Decoding models refer to any model that inputs brain activations and outputs predicted stimulus or task information. Given how general these terms are, there are various approaches for implementing encoding models and decoding models. If you are specifically interested in formulating a model that can predict brain activations from task information, then you may only want to use an encoding model. Decoding models can be easily obtained by simply inverting the encoding model.
## Relevant information
### Overview of voxel-based encoding and decoding models
[Naselaris, Kay, Nishimoto, & Gallant (2011): Encoding and decoding in fMRI](https://pubmed.ncbi.nlm.nih.gov/20691790/)
### Overview of inverted encoding modeling
Inverted encoding models are a simple, linear approach to encoding and decoding models. The output of inverted encoding models are reconstructions which are akin to population-level tuning functions over the stimulus space of interest. An inverted encoding model is used in the example simulation code below.
[Scotti, Chen, & Golomb (preprint): An enhanced inverted encoding model for neural reconstructions](https://www.biorxiv.org/content/10.1101/2021.05.22.445245v4)
### Encoding & decoding models vs. Multivariate pattern classification
Encoding & decoding models are not limited by classification (i.e., the model can only output a trained class), instead they may "reconstruct" stimulus information across a continuous space. For example, a classification model may only be capable of classifying a trial as a face or a scene (i.e., correct or incorrect guesses), whereas an encoder-decoder may fully reconstruct the stimulus (i.e., outputting a blurry looking recreation of the stimulus).

[Nishimoto et al. (2011): Reconstructing visual experiences from brain activity evoked by natural movies](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3326357/)
### Potential pitfalls of inferring neural representations with encoding models
[Popov, Ostarek, & Tenison (2018): Practices and pitfalls in inferring neural representations](https://pubmed.ncbi.nlm.nih.gov/29578030/)
# 10. Neural networks
Note that "neural networks" are not at all related to actual "neurons"!
## Different types of neural networks
### Perceptron
A feedforward neural network with no hidden layers, also known as a single layer neural network (because there is a single link between one input layer and one output layer). A perceptron can perform linear regression or linear classification, whereas more complicated neural networks perform nonlinear operations. Perceptrons are basically identical to support vector machines in terms of functionality, with the exception that perceptrons seek to find any decision boundary that can separate classes whereas a support vector machine seeks to find the decision boundary that maximizes the distance between the classes. For this reason, support vector machines are generally preferable to perceptrons. Perceptrons are still very useful for educational purposes because they represent the most basic kind of a neural network.

Diagram of how perceptrons work in the case of voxel data. The left side of the diagram depicts the input layer, where each node is a unique voxel. The right side depicts the output layer consisting of a single node outputting whether the presented stimulus was 0 or 1. Each voxel's activation is multiplied by a voxel-specific weight. All these products are then summed and then added by a bias term. The result can be immediately used (for regression problems) or be fed into an activation function (e.g., step function) for classification problems. The predictions are compared to the ground truth and if there is a difference then the weights and bias are adjusted.
### Deep neural network (DNN)
A feedforward neural network with at least 1 hidden layer between the inputs and outputs
### Convolutional neural network (CNN)
A DNN with convolutional filters
### Recurrent neural network (RNN)
## Relevant information
### PyTorch: Accessible Python package for implementing neural networks
[PyTorch](https://pytorch.org/)
# 11. Functional connectivity
## Relevant information
### Overview of functional connectivity using fMRI
[Rogers, Morgan, Newton, & Gore (2007): Assessing functional connectivity in the human brain by fMRI](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2169499/)
### Example of functional connectivity
[Hampson, Tokoglu, Sun, Schafer, Skudlarski, Gore, & Constable (2006): Connectivity-behavior analysis reveals that functional connectivity between left BA39 and Broca's area varies with reading ability](https://tauruspet.med.yale.edu/staff/edm42/courses/ENAS_880_2014/papers/week-11%20connectivity/Hampson06reading.pdf)
# 12. Graph theory / network neuroscience
The brain consists of networks of brain regions all communicating with each other to varying extents. Network neuroscience refers to how researchers identity these discrete networks and their functional roles.
Perhaps the most famous brain network is the default mode network, thought to be responsible for functioning when we are awake but not engaged in any specific mental task, specifically the network of brain areas which all decrease their activity during attention-demanding tasks and increase their activity when doing nothing. Such a network was only possible to detect given whole-brain data and large-scale explorations of brain activity, in contrast to more traditional ROI-specific analysis techniques.
Although it can be possible to study brain networks without graph theory, graph theory is a useful tool for more nuanced detection of brain networks and how the networks functions. In graph theory, a network is a collection of nodes (or vertices) and their corresponding connections (or edges). The nodes are typically large brain regions (specifically "parcellations" of multiple brain areas that share some functional role), but graph theory is not limited to large-scale organization (nodes can be individual neurons) nor limited to functional networks (although structural networks may use different graph theory methods).
Graph theory unfortunately contains a lot of jargony terms that may seem confusing at first. A "community" or "module" is simply a discrete network (mostly independent set of connected nodes) in a graph (where a "graph" is the full picture of all possible nodes/edges). "Neighboring nodes" simply refer to nodes that are directly connected (i.e., can reach the node with a single edge). "Cluster" refers to a group of nodes that are close together in a graph, and measures of clustering refer to a network is connected. A "hub" is a node that acts as a connector / transit point between communities. A "path" is a single possible route between nodes. "Degree" refers to the number of connections to a node. "Centrality" refers to how important certain nodes are in a graph (where "importance" can be defined in various ways, such as number of connections). "Rich club organization" refers to a group of highly interconnected nodes (i.e., more densely connected than would be expected by chance). "Small world networks" refer to networks where most nodes are not neighbors of one another, but for the nodes that are connected the average path length is very short.
## Relevant information
### Overviews of using graph theory to understand brain networks
[Sporns (2018): Graph theory methods: applications in brain networks](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6136126/)
[Simpson, Bowman, & Laurienti (2013): Analyzing complex functional brain networks: Fusing statistics and network science to understand the brain](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4189131/)
# 13. Causal modeling / effective connectivity
Description
## Methods to investigate "causal" effects with brain data
### Randomization
The best way to study causal effects is to randomly sample participants into multiple groups, where one group experiences whatever is hypothesized to cause an effect and another group acts as a control group. You can then test if the effect occurs more frequently in the experimental group vs. the control group. Unfortunately, it is not always feasible to study causal effects in this manner. This is because if you, for instance, want to test if smoking causes lung cancer, it would be unethical to force a random sample of participants to smoke if they would otherwise not be smokers. Alternatively, perhaps your hypothesis is that the fusiform face area is causally responsible for face perception. It may be difficult or dangerous to lesion the fusiform face area of participants to test this hypothesis (note: researchers can get around such limitations by studying non-human animals or studying humans who coincidentally have electrodes in certain brain regions as a result of seizure treatment).
Note that all the below other techniques for studying causal effects have important limitations because they attempt to quantify causality using solely observational data. Conclusions based on the below techniques are susceptible to confounding variables; the assumptions that need to be made with the below methods (e.g., that their set of competing models is complete, or that all possible confounds or mediating variables have been accounted for) make the subsequent conclusions sometimes difficult to interpret. To fully account for these confounds would theoretically require comparison of an infinite set of causal models that take into account all unobserved variables.
### Granger causality
Granger causality views causal effects in terms of whether past information from A can help predict current information about B. For example, imagine your ability to predict activation in parietal cortex is good if you use the participant's parietal activation from the past ten seconds to train your model. If you also include activation from the participant's visual cortex activation from the last ten seconds and the model performs even better, then it can be said that the visual cortex is granger causing parietal cortex activation. Note that "granger causing" is not the same as a "cause". "Cause" implies that the causal brain region is directly inducing activation of the other brain region whereas "granger cause" refers to the lagged activations of one brain region improving a model of another brain region.
### Psychophysiological interactions (PPI)
Basically just adding a task-dependent regressor to a general linear model, looking at "causal" effect in terms of how data looks different dependent on the change in task demands.
### Structural equation modeling (SEM)
Similar to a general linear model but instead of minimizing error between the observed variables and predictors, maximum likelihood estimation (MLE) is used to estimate parameters for typically more complex relationships of the statistical model (e.g., latent variables, mediation, causal relationships). Only the variance-covariance matrix of the variables is used for MLE, such that SEM involves specifying various possible graphical structures between variables and seeing which structure yields a variance-covariance matrix that is most similar to the actual matrix (i.e., comparing log-likelihoods between models).
### Dynamic causal modeling (DCM)
Bayesian model comparison of competing encoding models where the parameters differ in terms of which brain regions are causing subsequent changes to the activations of other brain regions. Here "best" is in terms of being able to most accurately simulate the observed time-series of neural activity. The properties of the best fitting model are then suggested as being accurate latent properties of reality. DCM may be preferred over the other methods because it uses probabilistic (rather than deterministic) graphical models and does not assume that random fluctuations over time are uncorrelated. However, as with all of these methods (aside from randomization), an important downfall is that it can be impossible to determine all the possible underlying influences that may be latently responsible for downstream effects.
## Relevant information
### Effective connectivity and neuroimaging
[Buchel & Friston (1997): Effective connectivity and neuroimaging](https://www.fil.ion.ucl.ac.uk/spm/doc/books/hbf1/Ch6.pdf)
### Misleading causal statements in functional connectivity
[Mehler & Kording (2018): The lure of misleading causal statements in functional connectivity research](https://arxiv.org/abs/1812.03363)
### Granger causality with Python
[Adhikary (2020): Testing for Granger Causality Using Python](https://rishi-a.github.io/2020/05/25/granger-causality.html)
# 14. Intersubject correlation (ISC)
Intersubject correlation (ISC) measures the consistency of neural responses, usually in response to naturalistic stimuli, across individuals (Hasson et al., 2004, 2010). ISC relies on subjects receiving identical stimulus presentation. Unlike functional connectivity that correlates time-series between-voxels within each subject, ISC correlates time-series between-subjects within each voxel.
file:///Users/scotti/Downloads/Finn%20et%20al%20(2020).pdf
file:///Users/scotti/Drive/Papers/nastase%202019.pdf
https://www.nature.com/articles/nn.4135!
https://www.researchgate.net/profile/Peter-Ramadge/publication/299394785_A_Reduced-Dimension_fMRI_Shared_Response_Model/links/5751993308ae17e65ec319cc/A-Reduced-Dimension-fMRI-Shared-Response-Model.pdf
https://www.nature.com/articles/s41467-020-15874-w#Sec8
## Temporal intersubject functional correlation (ISFC)
ISFC can be used if your aim is to examine stimulus-evoked communication across brain regions. ISFC is the same as ISC except that the correlation is between two brain regions (across-participants) rather than within the same voxel.
Simony et al., 2016
## Fingerprinting
Fingerprinting can be used to predict individuals and individuals cognitive behavior (e.g., fluid intelligence) and refers to how individual-level functional connectivity is distinct and relatively robust across sessions/tasks.
https://www.nature.com/articles/nn.4135!
## Inter-subject representational similarity analysis (IS-RSA)
IS-RSA is the comparison of two subject-by-subject distance
matrices, for instance, two ISC procedures one with brain activations and one with behavioral responses. RSA can then be performed between these two matrices.
Finn%20et%20al%20(2020).pdf
## Shared response modeling
Re
## Relevant information
### Univariate analysis vs. multivariate pattern analysis
xxx
[Davis et al. (2014): xxx](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4115449/)
### Beginner friendly handbook on fMRI analysis
[Poldrack, Mumford, & Nichols (2011): xxx](https://www.cambridge.org/core/books/handbook-of-functional-mri-data-analysis/8EDF966C65811FCCC306F7C916228529)
### Example of univariate analysis
[Kanwisher, McDermott, & Chun (1997): xxx](https://www.jneurosci.org/content/jneuro/17/11/4302.full.pdf)
### Examples of repetition suppression
[Henson, R. N. (2003): xxx](https://pubmed.ncbi.nlm.nih.gov/12927334/)