This project is about creating a full machine learning pipeline that trains and predicts brain activity using EEG data.
EEG aka Electroencephalography Is it non invasive neuroimaging technique that measures electrical activity in the brain to be used for cognitive research.
It is recorded by placing a cap with a number of electrodes on a person scalp, as the subject is experiencing certain stimuli, certain regions of the brain will experience higher or lower electrical activity and the change in activity is captured by electrodes of a certain region.
Raw energy is represented as a time series data (the plot of altitude of each electrode against time). This method of interpretation is known as time domain.
An alternative way to view this eeg data is true frequency domain, this is done by doing a mathematical transformation (typically fast fourier transform) to obtain the frequencies that make up the time domain wave and their amplitudes
With the derived frequencies we can build something called a power spectrum to give a high level overview of the frequencies and their powers which compose the original signal. This power spectrum will be used for data filtering later on.
Keep in mind the diagram above only shows you the spectrum of one channel (electrode), the power spectrum of multiple channels would look something like this instead
ERP's are a particular kind of measure derived from EEG data. It can be considered short segments of continuous EEG data that are time locked to a particular event (e.g. stimuli). The idea of ERP is that by separating EEG into time chunks we can observe a pattern in the brain response to particular events.
Much on ERP research can be focused on components, Which are formally defined as peaks or troughs in an ERP time domain waveform that have consistent
Filtering typically means removing signals that are insignificant to the current study. When the such symbols is any reading outside the frequency of brain activity (1 to 30 Hertz) an instagle recorded outside this frequency (AC wirte frequency, environmental) is considered noise and may disrupt the readings which we intend to actually observe. Removing signals outside the frequency range is also known as low pass filter cutoff and it is done prior to preprocessing the data.
We also need to filter away artifacts (noise in the data attributed to a specific source) that is not relevant to the study. An example of this would be a study on limb movement should filter out signals generated by the eye blinking as it is insignificant to the study but affect the resultant signal a lot.
before fitlering, we observe activity in unrelated signals (> 30hz)
After filtering we observe that the unrelated signals activity start to dampen
ICA Takes a complex signal and separates into mathematically independent components. note the term components here is not the ERP components, but underlying signals that are mixed together.
For example, an audio recording of a conversation between two people is a mixed signal between person A, B and the environment. If we have three microphones for all three different entities, ICA helps us to unmix those signals. Just like EEG, our sources would be the electrodes and ICA will help us to analyze further.
In terms of artifact removal, ICA can help us to identify unique sources that may indicate the presence of an artifact. The ICA is good at capturing features of the data that explain most variants. Things like blinking eye movement and muscle movement has high variance in the eeg data; however, low frequency brain activity which is less than one hertz also attributes in variance.
since we want ICA to only identify non brain activity signals, we should filter the EEG data to have a 1 hz high pass prior to generating the ICA
Below are some examples of Icas that are derived from the main signal
So the images you see above are the various sources that make up our main signal With certain domain knowledge one can identify which signals are useless and filter them out in the main signal. However without domain knowledge one can just filter out the ICA signals that generate the most noise or variance.
The calculation of the variance is usually done by an external library such as autoreject in python which detects high variance when the subject unintentionally does a one time event like sneezing.
Below is an image of the logs of ica filtering.
Do note that while ICA is similar to fourier transform (signal seperation), in fourier transform, we separate a mixed signal based on its frequencies however in ICA we separate its main signal based on the sources of the sound which contain different frequencies.
Segmentation is an act of splitting continuous EEG data into time locked events of interest. The continuous EEG data will be split into slices based on the step size. each slice is one epoch.
After the epoch is split, we can apply the ICA corrections regenerated earlier. After that, we can use auto reject to automatically detect and fix noisy signals ICA did not fix.
When we do our usual training, there are tunable settings such as learning rate and momentum which are constant (not changed) in the learning process. As such, one might overfit the training data by tuning those settings until we get desired results. This can be mitigated with cross validation since we still want to tweak those parameters (hyperparameters) for optimal test accuracy
One of the cross validation methods is K-fold which works as follows:
Below is the original split
We would do training on train data as well as speed hyperparameters. In cross validation, the train data is further split into validation and train data
Any chunk of K - size can become the validation data set, but in cross validation we will use all of those combinations (split).
we will then train the model without the validation data for each split. the total score will be the mean of what scores from individual splits. When we make a change to a hyper parameter, we repeat this process until we get a satisfactory score.
https://builtin.com/data-science/step-step-explanation-principal-component-analysis
Principal component analysis is a dimensionality reduction and machine learning method used to simplify a large data set while still maintaining significant patterns and trends.
PCA can be broken down into five steps:
Principal components are new variables constructed as linear combinations or mixtures of original variables. Geometrically speaking, principle components represent the direction of data that explain the maximal amount of variance.
For the first step, it is equivalent to Z score scaling in order for all variables to contribute equally to the analysis. (variables with large ranges will dominate over variables with small ranges and give biased results).
the next step is to compute a covariance matrix. The aim of this step is to understand how the variables are varying with the mean with respect to one another AKA they are correlated to one another. the covariant matrix is A P*P
matrix where P
is the number of dimensions. The matrix has entities of the covariance with all possible pairs
Since the covariance with itself is its variance in the mean diagonal, we will have the variances of each variable, and the entries are symmetric in respect with the main diagonal.
Is the covariance is positive, then the variables are correlated. If it's negative, then they are inversely correlated.
In conclusion, the covariance matrix is just the table that summarizes the correlations between variables.
The next step would be involving calculating the eigenvalues and eigenvectors for each dimension (feature) in your data.
Eigenvectors and eigenvalues comes in pairs and they represent the directions of axes where there is the most variance aka principal components.
by ranking the eigenvectors and order of their eigenvalues highest to lowest, we get the principal components in order of significance.
below is an example of eigenvectors and eigenvalues for 2D variables
We can deduce that the first principle component is V1 And the second principle component is V2 based on their rankings. To compute the percentage of variance (information) accounted for by each component, We divide the eigenvalue of each component by the sum of Eigenvectors.
In the next step, we can choose to discard the lower principle components or to keep them. So we will create a feature vector, which is just a matrix of columns of the eigenvectors of the components we want to keep. We then recast (project) the standardized data set to the feature vector.
Example: suppose we have a data set like so and we wanted to project to a one dimensional line while maintaining information.
It will not make sense to project to L2 because the data would be clumped up and undistinguishable, L1 would be a better choice.
the calculation of both L1 and L2 and choosing which one is the best is what PCA does