# PracticalMEEG-2025 MNE-Python, archive of day 4
## Day 4: Multivariate analysis
#### **Ice breaker:**
- Did you use AI this week?
- no: :robot_face::robot_face::robot_face:
- yes: :robot_face::robot_face::robot_face::robot_face::robot_face::robot_face::robot_face::robot_face::robot_face::robot_face::robot_face::robot_face::robot_face:
- And what for?
- installation problems :robot_face::robot_face::robot_face::robot_face::robot_face::robot_face::robot_face::robot_face:
- scientific concepts :robot_face:
- python questions :robot_face::robot_face:
- completing my lecture notes :robot_face:
- culture explaination during trip :robot_face:
- X
- X
- X
- all of the above: :robot_face::robot_face:
#### :bangbang: Also today for the last day we've added some more exercises to the notebook for today. Please redownload!!: https://github.com/wmvanvliet/mne_practical_meeg_2025/archive/refs/heads/main.zip
#### most important link: PracticalMEEG2025 playlist: https://open.spotify.com/playlist/1EVtKZrp3r9fn7ALUbpxp7?si=HH06rGyaQy60M5W7zEdZ2Q
(Alex is working on a YouTube one, ready soon!)
:+1: :heart:
## Citing MNE-Python
We hope you've enjoyed using MNE-Python this week. If you use it for your analysis and publish a paper about it, don't forget to cite all the software packages that you used so the authors receive proper credit. For MNE-Python, cite these sources:
- Larson, E., Gramfort, A., Engemann, D. A., Leppakangas, J., Brodbeck, C., Jas, M., Brooks, T. L., Sassenhagen, J., McCloy, D., Luessi, M., King, J.-R., Höchenberger, R., Brunner, C., Goj, R., Favelier, G., van Vliet, M., Wronkiewicz, M., Rockhill, A., Appelhoff, S., … user27182. (2025). MNE-Python (v1.10.1). Zenodo [doi:10.5281/zenodo.16794835](https://doi.org/10.5281/zenodo.16794835).
- Alexandre Gramfort, Martin Luessi, Eric Larson, Denis A. Engemann, Daniel Strohmeier, Christian Brodbeck, Roman Goj, Mainak Jas, Teon Brooks, Lauri Parkkonen, and Matti S. Hämäläinen. MEG and EEG data analysis with MNE-Python. Frontiers in Neuroscience, 7(267):1–13, 2013. [doi:10.3389/fnins.2013.00267](https://doi.org/10.3389/fnins.2013.00267).
For more information, see: https://mne.tools/stable/documentation/cite.html
## Session questions
- do you think we could get a summary of either the used references for the lectures / recommended further reading from our speakers?
- Natalie: that sounds feasible, let's try to come up with this. Maybe when the slides will be sent around, or putting it in the forum later!
- basics: 
- MEG/EEG primer book has the best figures and great for general MEG/EEG instrumentation: https://academic.oup.com/book/46874
- Mike X Cohen, book great for data analysis: https://mitpress.mit.edu/9780262319560/analyzing-neural-time-series-data/ Matlab in the book, but the corresponding python code is somewhere, also lectures from him on YouTube
- awesome thank you! :heart_eyes_cat:
- Haufe paper: https://www.sciencedirect.com/science/article/pii/S1053811913010914
- MNE-Python tutorials: https://mne.tools/stable/auto_tutorials/index.html
- I did not get this patterns & filters aspect at all
- This is a complex topic to grasp intuitively. The Haufe paper is the best resource for learning about this: https://www.sciencedirect.com/science/article/pii/S1053811913010914
- :white_check_mark: **when could we get our slides :)**
- The organizers should share them soon, you can probably get more info from them
- :white_check_mark: Question not very related to the topic today: any tips for manually annotating events? If events.tsv contains events that are not in any channel in our data.
- Which events do you want to annotate manually? Usually, we try to send triggers automatically to achieve better synchronization with recorded M/EEG data.
- I have audio data (I have when the audio starts), but I would need to annotate when each word/syllable starts in the MEG data.
- Ah I see, thanks for clarifying. For this, you can create the `events` structure (same 3 columns - onset, duration, event code) yourself based on the audio and add it to your Raw data using this function: [link](https://mne.tools/stable/generated/mne.io.Raw.html#mne.io.Raw.add_events). There might be some examples in the MNE documentation.
- Thanks a lot!!
- :white_check_mark: would you equalize the event counts also for subcategories - for ex. between famous and unfamiliar in this case?
- Ideally, yes.
- But the importance of balance between subcategories might also depend on the research question, I think. If the ERP is similar for all subcategoried in a certain time window, then a slight imbalance should also be fine (maybe?).
- :white_check_mark: **Is there a reason why we only get magnotometers here, can we not just use all trials and if the sensors are not informative the classifier wont pick up anyway**
- For this tutorial, we also use only a subset of channels to speed up the computation. Putting everything should also work but the data needs to be standardized (i.e., brought to the same scale) when you combine different types of sensors (e.g., grad+mag or meg+eeg). Otherwise, the classifier will just prioritize sensors with higher variance, not necessarily more important information.
- It is not true that if a sensors is not informative, that it is not "picked up" by the classifier. The classifier will try to project the data away from all noise sources, hence away from all non-informative sensors.
- :white_check_mark: is there a way to run a "power analysis" to estimate how many trials do we need to have reliable results?
- The answer might depend strongly on the type of method that you plan to use, so I'm not sure whether there is a general approach to that. However, you can always try simulating some data with your effect of interest and find the desired number of trials empirically by evaluating power for different number of trials available.
- :white_check_mark: what does equalize do exactly? downsampling/upsampling with which strategy?
- the answer to the equalize is making sure the decoding performance is not coming from different number of trials induced different weight of contribution.
- Certainly not upsampling, the events are only removed. The strategy depends on the value of the `method` argument, feel free to check the details in the docs: [link](https://mne.tools/stable/generated/mne.Epochs.html#mne.Epochs.equalize_event_counts)
- :white_check_mark: Is it possible to use different types of psychophysiological data (e.g. EEG + ECG) directly within X, considering that the Scaler will "normalize" them? Or should there be only "type" of data (brain data)?
- It should be possible, yes. And you're right that normalization is very important to give both types of data a chance to contribute to the predictions. There might be other challenges that don't come to my mind immediately but certainly possible in general.
- :white_check_mark: we set the c value to a fixed value here, can you talk about what does the c parameter change and does it make sense to do hyperparameter tuning?
- It affects the regularization. Without regularization, the model would try to fit the data perfectly. But since the data might be noisy, the model might try to reproduce the noise as well and not generalize well as the result. The higher C is, the more regularization we apply, and the more general patterns in the data the model will capture. It certainly makes sense to do hyperparameter tuning with a train/validation/test dataset split.
- :white_check_mark: Maybe because I am new to the concept but can you do cluster analysis between more than 2 condition? And would this make sense?
-Yes, please check here https://mne.tools/stable/auto_tutorials/stats-source-space/60_cluster_rmANOVA_spatiotemporal.html
- :white_check_mark: Sorry, I don't understand the type of scaling applied to the data: we use scaler(epochs.info), but I did not understand exactly what it does
- Different sensors have different scales (i.e., amount of variance) - MEG typically detects fields of 10^(-15) T, while EEG detects potentials of tens of microvolts (10^(-6) V). If we just combine the data from all sensors "as is", MEG information will be effectively discarded because the values are very small. Scaler makes sure that the variance of different types of sensors is similar.
- :white_check_mark: Can we do a reduction of dimensionnality in the pipeline, will it affects the result ? Is it better?
- Yes, we can. How it affects the results, is generally unclear. Might certainly help when there is little data available to help the model to focus on meaningful patterns in the data.
- If we can reduce the number of features, i.e. the number of sensors and time points, in such a way that we only use the most informative ones, then the accuracy should improve. In this case, we are helping out the algorithm by only selecting useful features.
- :white_check_mark: for the cross validation, is it better to do a repeated k-fold CV?
- It might give a more stable estimate due to repetitions but it would still be important to ensure no leakage of training data into the test set.
- :white_check_mark: why we cannot interpret the filters?
- Imagine data with 2 channels. The first channel measures a useful signal plus noise, while the second channel measures only noise. Then, for the model it would make sense to subtract the second channel from the first one (filter [1, -1]) to get the clear signal. So both channels will a similar weight (in terms of the absolute value) but it doesn't mean that the second channel shows useful signal.
- We **cannot** say: this sensor/time-point has a large weight, hence here is where there is useful information, this other sensor/time-point has a small weight, hence here is no useful information.
- :white_check_mark: Is it interesting to play with hyperparameters?
- Not only interesting but also required to squeeze out the maximum performance from the model :)
- :white_check_mark: doing CV over all the data isn't somehow wrong? I guess the model overfits since it sees the data of the same subjects both in train and test (or validation). Wouldnt make more sense to divide the data based on the subjects?
- Yes, you're absolutely right - the final evaluation should ideally happen on a hold-out test set.
- :white_check_mark: Is the scaling just L2 normalisation? If not, what is it.
- More or less. The `Scaler` object can also subtract the mean of the data if needed. But the main purpose is to normalize the variance, which is quite similar to the L2-norm.
- :white_check_mark: Why 5 folds and not 10?
- Mostly, for the purpose of demonstration and speed. Feel free to adjust the number of folds for your own data.
- :white_check_mark: are we doing the analysis timepoint by timepoint here?
- can you clarify your question a bit?
- is the classifier fitted on every time point and tested on every time point or are we approaching the 400ms window as a whole and the classifier is trained/tested on the whole period?
- We did it the second way - the `Vectorizer` object flattens the data and creates one vector with the data from all channels and time points, which are simultaneously used for classification. One could, of course, apply the classifier for each time point separately. See "Decoding over time" section of the notebook.
- :white_check_mark: follow-up: so we have each combination of channel-timepoint as a separate predictor with its own parameter?
- In the beginning of the notebook - yes.
- :white_check_mark: I got Spatio-temporal: 66.66666666666666% not the same as demonstration, is that normal? - I got 63.33333333333333% so maybe depends on how the model works? True
- Splitting into folds is random so the results might fluctuate a bit unless the seed of the random generator is fixed.
-:question: Also envent selection for balancing the conditions might not be the same, right?
- :white_check_mark: Why do we need the 'lm = LinearModel(logreg)'' ? why don't we use the 'logreg' itself?
- `LinearModel` is a wrapper provided by MNE that provides some functionality that is specific to M/EEG data - plotting filters/patterns. The actual decoding is performed by the `logreg` object.
- :white_check_mark: I didn't get how we can improve the Linear Regression model (going from 83 to 96). Is it possible to show the change made in the code?
- Switch the model from `lda` and `logreg` in the line `lm = LinearModel(lda)`
- We also set the `C` parameter of the `LogisticRegression` object.
- :question: why is the unit of the lower plot still ft? shouldnt it be accuracy over time? like representing the decoding?
- on which line of the code?
- visualizing filters and patterns line 28
- No, the values are unitless/arbitrary - these are the weights, with which we either combine the channels (filter) or the extracted pattern projects to all channels (pattern)
- but what is the timecourse below the topoweightplots, this has unit ft. i didnt get what we plotted
- Ah sorry, maybe, I confused the plot. Could you please post the screenshot?
- 
- Thanks! Here we plot the patterns. I believe that the fT units don't apply. Just for the sake of plotting, we packed the pattern values into an `EvokedArray` object, which stores fT as units. But I believe that the values should be unitless.
- :white_check_mark: I don't understand why these lines are for spatial filtering:
spatial_filter = np.zeros((len(epochs.ch_names)))
spatial_filter[picks[0]] = 1
- you first create a map in the object spatial_filter initialized with zeros [0 0 0 0 .. 0] and the same length of the original epochs data, then based on your picks you turn some of the values to 1 to select it. Few lines later there is a multiplication, whatever you multiply by 0 is just taken away, i.e. spatially filtered
- This spatial filter is equivalent to just using the data from one channel (with the index `picks[0]`). So essentially there is no filtering as such yet, but this is the simplest filter one could come up with.
- :white_check_mark: based on the above answer: so basically spatially filtering the data means compute the cov matrix and silent only one or some other sensors? yeah that's how I understood it, I'm not one of the Traineers, but the code does exactly what you said. I would add that it does not necessarly need to be on cov matrices, you can use this technique on raw data as well if you are interested in only certain channels, but I might be wrong on this definition
- Somewhat along the lines of that. Not every filter requires covariance matrix, but the idea of spatial filtering is indeed to enhance a certain component that is otherwise buried in our multivariate data but could be useful for classification.
- :white_check_mark: in theory can we look at the spatial patterns from one part of our data and choose only the sensors seemed to be informative on another part of the data (for ex. do the decoding on the "encoding" and checl the patterns from that and using those channels in the "maintenance" phase)? is this double dipping already? because they would be different data
- I would not recommend that. Channels that have low values in the pattern might still have high values in the filter even they were useful for removing some noise from the data. But it's certainly plausible to train the filters on one part of the data and apply to another part.
- :white_check_mark: so in the previous part, if i got it well, the filter is calculated via machine learning across epochs
- In general, yes. But since for the first part, we combined the data from all time points, the filter becomes spatiotemporal and is harder to visualize and interpret.
- :white_check_mark: I dont understand what are the patterns? how are they different form filters?
- I hope that it was addressed on stage. If more information is needed, please check [link](https://doi.org/10.1016/j.neuroimage.2013.10.067)
- :white_check_mark: Where the logreg is defined, it talks about a regularization but how is it implemented? is it the C=1e6 in LogisticRegression(C=1e6, solver='liblinear') ?
(removed part of the question cause it has been asked before :)
- Yes! Regularization is implemented by adding a penalty to the model when the weights become too large. The strength of the penalty depends on C.
- Thank you!!
- :white_check_mark: I don't understand what is the aim of spatial filters?
- Spatial filter means that we obtain a weighted sum of data in the channels that we have, By combining the data from multiple channels, we can enhance the signal of interest that is consistently present in these channels and suppress noise that is present only in some channels.
- :white_check_mark: How did you choose the value -.25 ?
- Laplacian filter subtracts the average data of 4 neighboring channels, so each channel has a weight of 1/4=0.25, minus indicates subtraction.
- :white_check_mark: if we have 150 subjects and we can't do permutation tests bcs of computation, what can we do at the group level to test significance? is the sign flip /cluster based permutation a good idea with different score of a group?
- I think that once you average across participants, the permutation test is not possible anymore. So you would likely still need subject-specific data points but can try to reduce the set of channels / time points, maybe? Or keep computing :smiling_face_with_tear:
- :white_check_mark: What about confusion matrix and computing classification parameters such as precision accuracy AUC?
- These metrics could also be useful for evaluation, especially if it is not possible to balance the classes easily.
- :white_check_mark: how many permutations should we do (in terms of maximum) and what are the implications of doing too many permutations ?
- in short, at least more than 1000. 5000 is ideal. As the number of permutations increases, the requirement of RAM is also increases. Please take number of channels, time points, and conditions into formula before running permutations in daily practices.
- :white_check_mark: I still don't understand the "p-value = 0" (from the exercise)
- We did 100 random models. None of the 100 random models performed better than our "real" model. Hence the p-value for the real model is 0/100 = 0
- :white_check_mark: Why do we use other classification methods such as rain forest, SVM, ANN, Discriminant Analysis etc
- Feel free to use them! Today we just demonstrate some out of many possible examples.
- :white_check_mark: So for the sliding estimator, is this timepoint by timepoint or is a window (e.g., +/- 0.05s) around each timepoint?
- From the [documentation](https://mne.tools/stable/generated/mne.decoding.SlidingEstimator.html#mne.decoding.SlidingEstimator.fit), it seems that it is timepoint by timepoint. But please double-check.
- :white_check_mark: Is there any reason why you're plotting the ROC_AUC instead of the mean accuracy across the sliding window?
- ROC AUC might be suitable in case of unbalanced classes. For today's example, both metrics should show similar values.
- :white_check_mark: What exactly is the difference between decoding over time and ERP analysis? What are the advantages and disadvantages of each approach?
- it depends on your research question. ERPs already have a lot of litatures. If you are new to EEG, ERP can potentially provide you basic understanding of stimulus induced time series reponses.
- Advantage of machine learning: it can be more sensitive to differences in experimental conditions because it can pool data from multiple sensors/time-points and reveal things that would otherwise be hidden.
- :white_check_mark: wouldn't it be more informative to plot the ROC curve above the random chance diagonal?
- I missed the context but for the AUC, random chance diagonal has AUC of 0.5, so the part above chance has area of AUC-0.5, just subtracting a constant. Feel free to follow up if I misunderstood the question
- :question: How can we draw ROC curves that consist performance curve that AUC values especially?
- Sorry, I can't understand the question. Could you please rephrase it?
- :white_check_mark: why is the plot so different between filters and patterns ?
- Filters might have subtract noise from distant channels to improve the signal-to-noise ratio. Patterns show how the resulting extracted signal projects to sensor space. So since filters and patterns show different things, they can be quite different. You can apply source reconstruction / dipole fit to the topomaps of patterns to get the location of the source that corresponds to the pattern.
- :white_check_mark: When applying SSD with constraint on alpha activity, why did we minimize activity from 7 and 13Hz as well? Is it necessary to define the minimizing constraint or just that we really wanted to single out activity from the alpha band?
- SSD works by maximizing the contrast between the target band (8-12 Hz in our case) and the neighboring frequencies (using 1-Hz windows around 7 and 13 Hz). So defining both bands (for signal and noise) is necessary.
- :white_check_mark: Got it! But a followup: given the individual differences in alpha peaks (and essentially any other oscillations), wouldn't we actually suppress the "alpha" for some participants if we apply this general alpha filter to everyone? In practice should we define individual peak frequency first before applying the filter
- Interesting idea but multiple factors to consider. First of all, SSD will extract multiple alpha components (through multiple spatial filters) even within one participant. The components will also likely differ in terms of the peak frequency and topomap of the spatial pattern. So component #1 for subject #1 might easily mean something completely different compared to component #1 for subject #2 (e.g., motor vs occipital alpha). This is important to consider when applying SSD to multiple subjects. Otherwise, adjusting the fitting range of SSD based on the peak frequency could certainly be meaningful. There is also an option to fit SSD on the grand-average covariance matrices of all participants to get the samee set of filters for all participants, but the resulting filters might also be dominantly determined by few subjects with very high signal-to-noise ratio of alpha.
- :white_check_mark: You mentionned the permutation method to assess decoding accuracy at the individual level, what about statistic at the group level? After averaging the temporal decoding of all subjects?
- In this case, you can perform a test across the decoding accuracies of each subject, very much like you would do on a sensor-level analysis.
- :white_check_mark: What is plot on the y-axis of the weights and pattern plots?
- This depends on the scaling/normalization we applied. They are usually z-scored values, so A.U. (arbitrary units). Only if you undo the normalization can you translate these values back into a unit such as Volts or Tesla.
## :bangbang: can we ask more specific questions to our analysis here? or maybe have a new section for this?
- Yes! Put your question here, and the instructors will return to them shortly after the workshop!
- :white_check_mark: I did source reconstruction using a parcellation called Conte69. And I get the following labels for sources: L_21_B05_02, R_7_B05_11, etc. I'm not really sure what these labels mean, and I haven't been able to find it anywhere! L and R are left and right, and I think "B05" is probably broadmann area, but the rest? are they maybe following some convention I don't know about? Thanks a lot!
- This information should be present in some materials that describe this parcellation (the paper or a README). Otherwise, the names are too obscure to make a reasonable guess. Alternatively, you may consider other parcellations supported by MNE, e.g., the Desikan-Killiany atlas (`aparc`).
---
**:arrow_up: Add new questions just above this line :arrow_up:**
## Please put your feedback for the whole workshop below:
- Now I can impress my supervisor with the cool skills I learned :+1:
- Too confusing for me but I will try to spend more time with the materials
- Fantastic week! Definetely needed to consolidate some concepts and learn new methods! :+1: :+1: :+1:
## PracticalMEEG2025 people on bluesky:
- https://go.bsky.app/HcswbiV
- if you have a science-related one, put it here or let me know otherwise and I'll add you to the list :)
---
## :bangbang: More "stupid" questions please! No fear to ask things that are unclear to you! Feel free to use the mics as well