Preprocessing of EEG data

# Preprocessing of EEG data In this notebook, we will go through the basics of preprocessing EEG data using MNE. We will be using some sample data provided with the MNE package to do so. To preprocces the data following steps are performed 1. Load the data 2. Exclude bad channels 3. Common average reference 4. Filtering 5. Saving the data After preprocessing the data is epoched for further analysis. This is done in the notebook [EEG_analysis.ipynb](EEG_analysis.ipynb) ## Setting up Python First of all, we need to make sure that we are working in the `env` environment. go into EEG bash env_to_jupyter.sh 1. Run `bash env_to_ipynb_kernel.sh` from the `EEG` folder if you have not already done so. This will make sure that the `env` environment is available as a kernel in this notebook. 2. Press `Select Kernel`, then `Jupyter kernel...` and select `env`. If `env` does not show up, press the little refresh symbol! **Note:** You might have to install the Jupyter extension for VScode to be able to select the kernel. import mne import numpy as np import os import matplotlib.pyplot as plt %matplotlib inline ## 1. Load the sample data To begin with we load the MNE sample data. It contains data from an experiment where checkerboard patterns were presented to the subject into the left and right visual (hemi)field, interspersed by tones to the left or right ear. The interval between the stimuli was 750 ms. Occasionally a smiley face was presented at the center of the visual field. The subject was asked to press a key with the right index finger as soon as possible after the appearance of the smiley. Looking at trials across modalities (auditory/visual) we can see the contrast between auditory and visual processing, while inspecting the left/right trials allows us to observe the contralateral visual processing of the brain (i.e. what is presented to the right visual hemifield is processed in the left visual cortex and vice versa). Information about the sample data can be found [here](https://mne.tools/stable/documentation/datasets.html#sample-dataset) (the dataset is called sample). raw = mne.io.read_raw_brainvision("/work/raw/own_experiments/group4_own.vhdr", eog=('HEOG', 'VEOG'), misc=['41']) # raw is an MNE object that contains the data of the class Raw raw.info['bads'] = [] raw.load_data() ### Questions Looking at the ouput from the load_data() function, answer the following questions: **Q1:** How many EEG channels are there? **A:** 30 **Q2:** Were any EEG channels marked as bad during recording? **A:** No **Q3:** What is the sampling frequency? **A:** 1000 Hz **Q4:** How many minutes of data were recorded? **A:** 00:21:36 montage = mne.channels.make_standard_montage('standard_1020') raw.set_montage(montage, verbose=False) **Now lets try and plot the raw data!** raw.plot(); Right now we are plotting using the default argument values of the plot() function. Try to play around with `Raw.plot()` method in order to: **1. Plot all EEG channels simultaneously** raw.plot_psd() raw.plot_psd_topo() raw.plot_psd_topomap() raw.plot_sensors() **2. Plot a full minute of the recording** start_time = 0 duration = 60 n_channels = len(raw.ch_names) raw.plot(start=start_time, duration=duration, n_channels= 60) raw.set_eeg_reference('average', projection=False, verbose=False) ## 2. Exclude bad channels When plotting all channels simultaneously, it is evident that one of the channels is not really picking up any signal (that is, it is flat). This is a bad channel and should be excluded from further analysis. There are other ways that channels might be bad, such as being too noisy, or picking up signals from other parts of the body. In this case, we will just exclude the channel that is flat. **Begin by marking the channel as bad using the info attribute of the raw object. Then, plot the data again to see that the channel is now excluded.** #raw.info["bads"] = ["EEG 053"] **After marking the channel as bad, we need to exclude it from the data. This is done using the pick() method of the raw object.** Hint: set exclude='bads', but remember to keep the stim and the rest of the EEG channels! #raw.pick_types(eeg = True, stim = True, exclude = "bads") ## 3. Common average reference The idea behind common average reference is to compute the average of the signal at all EEG electrodes and subtract it from the EEG signal at every electrode for every time point. To set this “virtual reference” that is the average of all channels, you can use set_eeg_reference() with ref_channels='average'. This is done after excluding bad channels, so the average is computed only over the good channels. The rationale behind this is that the average of all the potentials recorded on the whole head due to current sources inside it is zero, this would make for a quiet or electrically neutral reference. However, in practice, to achieve such an ideal reference one would require large number of electrodes that cover the whole head uniformly, which is not the case in EEG recordings where limited number of electrodes cover mostly the upper part of the head. If you want to know more, this is a good [resource](https://pressrelease.brainproducts.com/referencing/#20). <div class="alert alert-block alert-info"> <b> Note: </b> The data used here has already been referenced to the average of all channels, but this code will be needed when you analyse your own data! # common average reference # raw.set_eeg_reference('average', projection=True) # applying the reference # raw.apply_proj() # plot the data to check that it looks sensical # raw.plot(); ## 4. Filtering Now let's filter the data. A high-pass filter minimises slow drifts in the data (e.g. scalp potentials), while a low-pass filter excludes high-frequency noise (e.g. line noise (50 Hz) or EMG (muscle-related artefacts), with frequencies higher than the frequencies of the signal we are interested in. **Apply a high-pass filter at 0.1 Hz and a low-pass filter at 40 Hz, following the typical practises of EEG preprocessing.** *Hint: The `Raw` class has a `filter()` method that can be used to filter the data* raw.filter(0.1,40) **Plot the data to inspect the effect of the filtering** raw.plot() ### Question **Q5:** If you compare the raw data with the filtered data, what differences do you see? **A:** ## 5. Saving the preprocessed data #import os #output_path = os.path.join(os.getcwd(), "preprocessed_data") # make sure output path exists #if not os.path.exists(output_path): # os.mkdir(output_path) # save the data #raw.save(os.path.join(output_path, 'preprocessed_data.fif'), overwrite=True) ## Epoching of EEG data Now that the data has been preprocessed, we can epoch the data. Epoching is the process of cutting the continuous data into smaller segments, called epochs. Each epoch is a time window of the data, centered around an event of interest. In MNE toolkit, the `Epochs` class is used to represent data that has been segmented into epochs, and it provides methods for averaging, baseline correction, plotting, and so forth. For a great overall introduction see MNE's [epoch overview](https://mne.tools/stable/auto_tutorials/epochs/10_epochs_overview.html). **To begin with we need to locate the events. This can be done using the find_events() function.** events, event_dict = mne.events_from_annotations(raw) events[0] events[1] print(events) **Create a dictionary of what each event ID represents.** I have already done this for you, but you can also do it yourself by looking at the events in the raw data and the documentation of the sample data. By using '/' we can actually later index one dimension *across* the other, i.e. if we just write 'auditory' we get all auditory events, both to the left and right ear. If we write 'auditory/left' we get only the events presented to the left ear. event_id = { 'Target/Con': 101, 'Target/Inc': 102, 'Nontarget': 103 } **Visualise the events by using the mne.viz.plot_events() function** mne.viz.plot_events(events, event_id = event_id); **Establish a time window for the epochs.** One suggestion is to use a time window of 200 ms before the stimulus onset to 500 ms after the stimulus onset. The 200 milliseconds before the onset of the stimulus enables us to examine a baseline of activity without stimulus presentation. The 500 milliseconds after the stimulus onset denote the time in which we expect the effect to occur. **Establish a threshold for rejecting epochs.** There are many ways to detect and deal with artefacts. Today, we simply select a value and reject anything above or below that value. We do this because we determine that values over or under this threshold are liekly not related to brain activity. There are additonal ways to deal with artefacts, such as ICA (presented as an optional exercise in the `EEG_preprocessing` notebook). **Create a `picks` variable with only EEG channels** picks = ['eeg'] **Create the epochs by using the mne.Epochs class** * Use the `reject` argument to reject epochs that are above or below the threshold we set earlier * Use the `tmin` and `tmax` arguments to set the time window for the epochs * Use the `baseline` argument to set the baseline for the epochs (from -0.2 seconds to 0 seconds relative to the event onset) *Hint: See this [link](https://mne.tools/stable/generated/mne.Epochs.html#mne.Epochs) on the `Epochs` class* tmin = -0.1 tmax = 0.7 reject_criteria = {'eeg': 150e-6} # 150 µV in volts baseline_period = (tmin, 0) epochs = mne.Epochs(raw, events, event_id=event_id, tmin=tmin, tmax=tmax, baseline=baseline_period, reject=reject_criteria, preload=True) print("Created {} epochs".format(len(epochs))) ## Downsample the data To reduce the amount of data we have to work with as well as the amount of time it takes to run the analysis the data is downsampled. This is done after epoching, as doing it before epoching can potentially mess with the precision of the extraction of epochs. **Resample the data to 250 Hz** DONT RESAMPLE! #epochs_resampled = epochs.resample(250., npad='auto') #print("Data resampled to {} Hz".format(epochs_resampled.info['sfreq'])) #epochs_resampled.plot() ## Save the epochs We will save the epochs as a `.fif` file. This is a file format that is specific to MNE. It is a binary file format that is very fast to read and write. It also stores all the metadata that we need to keep track of the data. #outpath = os.path.join(os.getcwd(), 'epochs') #if not os.path.exists(outpath): # os.makedirs(outpath) #epochs.save(os.path.join(outpath, 'epochs-epo-wed.fif'), overwrite=True) # ERP analysis Now that the data is epoched and resampled, we can start to analyse the data. **Group the epochs by modality (i.e. visual and auditory)** *Hint: See this [link](https://mne.tools/stable/auto_tutorials/epochs/10_epochs_overview.html#subselecting-epochs) on how to subselect epochs* # Grouping epochs by wmodality Target_c_epochs = epochs['Target/Con'] Target_nc_epochs = epochs['Target/Inc'] Nontarget_epochs = epochs['Nontarget'] Target_c_epochs.plot_image(picks=['T7']); Target_nc_epochs.plot_image(picks=['T7']); ## Evoked responses Unlike `Epochs`, which contain multiple trials that are each associated with a condition label (that is the event ID), `Evoked` objects are averages across trials for a single condition. Thus we have to create a separate `Evoked` object for each condition in our experiment. **Create an Evoked object for each modality** *Hint: use the average() method of the Epochs class* evoked_target_c = Target_c_epochs ['Target/Con'].average() # evoked_target_nc = Target_nc_epochs ['Target/Inc'].average() # evoked_Nontarget = Nontarget_epochs ['Nontarget'].average() # **Check the shape of the Evoked objects** #word_epochs.get_data().shape #image_epochs.get_data().shape **Compare the two evokeds** *Hint: use mne.viz.plot_compare_evokeds()* mne.viz.plot_compare_evokeds({"Target/Con":evoked_target_c, "Target/Inc":evoked_target_nc})