# Dynamic Mode Decomposition Based Epileptic Seizure Detection from Scalp EEG (Group - 12) ## The problem definition This paper presents dynamic mode decomposition (DMD), a data-driven dimensionality reduction technique, originally used in fluid mechanics, as an instrument for epileptic seizure detection from scalp electroencephalograph (EEG) data. We need to create a model that is trained on CHB-MIT datasets and KU Leuven, so that given a new EEG recording, it can reliably detect an epileptic seizure with reasonable accuracy. ## Motivation Due to a number of developments over the past few years, including the global push toward digital health care, improvements in signal processing methods, and increased computational power of machines, there has been a resurgence in interest in the reliable detection of the onset of epileptic seizures. A trustworthy automated system could significantly enhance the quality of life for those with epilepsy. DMD is used to assess the signal power in various frequency bands. These signal curve lengths and subband-powers are employed as characteristics for training random under-sampling. ## Dynamic Mode Decomposition(data-driven dimensionality reduction technique) For highly dimensional complex systems, this is used to get lower order models of the systems. The assumption behind this technique is the existence of a low-dimensional structure underlying otherwise high-dimensional measurements.DMD takes advantage of the low dimensional nature of the experimental data, making it a strong choice for feature extraction from EEG data. The datasets from CHB-MIT and KU LEUVEN include numerous scalp EEG channels. * Consider X is n X m data matrix constructed by horizontal concatenation of these measurement vectors for m consecutive time instances from t=0 to t=m-1, X=[x<sub>0</sub>, x<sub>1</sub> ………x<sub>m-1</sub> ] * Similarly, X’ is the n X m data matrix containing observation vectors from time instant t =1 to t = m, X’=[x<sub>1</sub>, x<sub>2</sub> ………x<sub>m</sub>] * Assumption: there exists a linear operator A such that X’=AX. We compute a low-rank approximation à of A and its eigen decomposition using the DMD algorithm. DMD algorithm basically has four steps: 1. SVD: * X=UΣV<sup>*</sup> * X’=AUΣV<sup>*</sup> 2. $Ã=U^{*}AU=U^{*}X’VΣ^{-1}$ 3. Eigen decomposition of Ã: * ÃW=W∧ , W is matrix of eigenvectors and ∧ is diagonal matrix of DMD eigenvalues 4. DMD modes * Φ ≈X’V Σ<sup>-1</sup>W, each column of Φ is a DMD mode corresponding to the eigenvalue. So X can be approximated to composition of coupled spatio-temporal modes, X̂=Φexp(Ωt)z. Here,Ω=log(∧)/Δt,Δt is the time difference between consecutive snapshots, and z is the weight calculated for the first time instant such that x<sub>0</sub>=Φz. DMD mode powers are basically square of its vector magnitude. P<sub>i</sub>=|Φ<sub>i</sub>|<sup>2</sup> In order to increase the size of the data matrix X, the conventional DMD algorithm is modified by stacking h consecutive observation vectors x so that the number of rows in x<sub>j</sub>is at least twice the number of columns. The spatial resolution of the data is inflated artificially by data augmentation. Number of columns of X will be equal to the sampling frequency. The calculated eigenvalues are shown in the complex plane, with their magnitude denoting the mode's strength and their angle denoting the mode's frequency. The following equation can be used to convert the phase of eigenvalues to frequency(Hz):f<sub>i</sub>=|imag(ω<sub>i</sub>/2Π) Here, ω<sub>i</sub>=log(λ<sub>i</sub>)/Δt are the diagonal elements of Ω matrix and imag(x) denotes the imaginary part of a complex number x. The eigenvalues occur in conjugate pairs for the data matrix X. ## Methodology ### 1. Preprocessing In order to verify that all patients in the CHB-MIT dataset have the similar recording settings, a total of 18 channels are chosen for processing, with recordings from patient 12 being removed. These channels are: P3-O1, P4-O2, P7-O1, P8-O2, T7-P7, and T8-P8. They are also: C3-P3, C4-P4, CZ-PZ, F3-C3, F4-C4, F7-T7, F8-T8, FP1-F3, FP1-F7, FP2-F4, and FZ-CZ. For KU Leuven, no channel selection was necessary. The recordings are broken up into separate, one-second epochs that don't overlap. The post-processing stage follows with individual processing of each epoch. ### 2. Feature Extraction Now, we need to extract some features from the dataset. To do this, we need to be clear about a few things. #### 1. DMD mode powers: - DMD mode is defined as $\phi_{i}$ such that $\phi_{i}$ is a column of the matrix $\Phi\approx X'V\Sigma^{-1} W$ - Now, DMD mode power is the square of it's vector magnitude. - So, DMD mode power, $P_{i} = | \phi_{i} |^2$ #### 2. How the data is being measured: - There are 20 electrodes placed on the scalp (spatial). - Electrode sampling rate is 256 Hz (temporal). - This disparity in resolution results in less number of dynamical modes than required to fully capture the dynamics of the neurological activity. Thus, a modification to the standard DMD algorithm is adopted to augment the data matrix X by stacking h number of consecutive observation vectors x<sub>j</sub> such that the number of rows in X becomes at least twice the number of columns. - Every n<sup>th</sup> row of X contains measurement from the n<sup>th</sup> channel. - Every m<sup>th</sup> column of X contains the data at time instant m. Now, to understand what features we are extracting. We can extract features using two methods, #### 1. Using DMD Mode Powers - We are measuring the voltage values 256 times every second, i.e. 256 Hz. - So, for every half second, we are combining the DMD powers in sets of 4 measurements and storing them in one bin. - So, there are 32 bins in total. - Now, we normalize all the bins so that the sum of the DMD mode powers of all the bins add up to 1. ![](https://i.imgur.com/aZuy99o.png) #### 2. Using Curve-length - This is a very logical way to extract features. So, what we are doing is that output from every single channel is a feature. - Now, we quantify this output using curve-length as it is an indirect measurement of the amplitude and variation of time-domain. - Curve-length $I$<sub>c</sub> for a n<sup>th</sup> channel is defined as follows, $I$<sub>c</sub> $=\sum_{k = 0}^{m - 2} \sqrt{(X_{n, k+1} - X_{n, k})^2 + \Delta t^2}$ - Here, $k = 0 \to m-2$ refers to all the time instances, and $\Delta t$ is the sampling interval (1/256 seconds in this case). - So, all n channels are features and the $I_{c}$ is the value for every feature. - Curve lengths are calculated for each 1-second epoch of each channel of data. ![](https://i.imgur.com/B7AMek4.png) For this work, both DMD Mode Powers and Curve-lengths are used as features for this dataset. And hence, the total number of features are, - 50 for CHB-MIT dataset. - 54 for KU Leuven dataset. While this work primarily depends upon the use of spatio-temporal DMD modes for seizure detection, the curve-length feature plays a complementary role by providing additional temporal information about the EEG signal. ### 3. Classification - There are two types of recordings - completely seizure free (non-ictal) recording - and those containing one or more seizures, i.e. comprises of both Ictal and non-ictal regions. > Ictal is defined as the period of seizure. - Half the seizure-free recordings were used for training, and the other half for testing. - Recordings containing seizure recordings were split such that duration of seizures for both training and testing are the same. - No recording was split such that different portions of it were in both training and testing data. - $n$-fold cross-validation was employed for KU Leuven dataset, where $n = 27$ where $n$ is also the number of recordings. *The issue here, now remains that the data from one class (ictal recordings) is much less than from the other class(non-ictal). Therfore, there is an imbalance and hence, most traditionally used classifiers do not give promising results.* There are a few techniques we can use to solve this: #### 1. Data Sampling - *Over-sampling* - By adding samples of the smaller class. - *Under-sampling* - By removing samples of the larger class. #### 2. Boosting (AdaBoost) - boosts the strength of learners by initially building an iterative ensemble of models. - At each iteration, AdaBoost aims to correctly classify those samples, in the next iteration by adjusting weights, which were incorrectly classified in the current iteration. - By default classifies the data using decision tree classifier. #### 3. Boosting (RUSBoost) - RUSBoost is a hybrid approach that uses a combination of sampling and boosting. - The high-dimensional nature and imbalanced distribution of EEG data make RUSBoost algorithm an ideal candidate for seizure detection. - Since, RUSBoost is built on top of AdaBoost, it uses the decision tree classifer by default. ### 4. Postprocessing The transition between a non-seizure state to seizure state is not sudden and can take many epochs. The median length of seizures is from 18 seconds to 130 seconds. - A run-length smoothing filter is applied such that 10 consecutive positive detections is needed for the classifier to give positive predictions. This is to reduce false alarms. - Groups of 60 epochs each are made to avoid over-representation of positive detections in the case of multiple seizures in a time window. ## Results and Discussion To evaluate the performance better the metrics are defined as such: - *True Positive* (TP) is reported if positive detection occurs anytime between 1 minute before onset and 1 minute after the end of the seizure marked by an expert. - *False Positive* (FP) is reported when a positive detection occurs outside the seizure window, which consists of the seizure, 3 minutes of pre-seizure activity, and 3 minutes of post-seizure activity. - *True Negative* (TN) is a seizure free region reported correctly by the proposed algorithm. - *False Negative* (FN) is a seizure that the algorithm fails to detect. - *False Positive Rate* per hour is the average number of false alarms reported in an hour. - *Onset Latency* is the delay in the detection of seizure by the algorithm as compared to the ground truth marked by clinician. - ### CHB-MIT Dataset The classifiers are patient-specific and the parameters, have been chosen to get best results on per-patient basis. The results for performance metrics and their weighted averages is shown below. The weights are according to total number of seizures and epochs for each patient. ![](https://ieeexplore.ieee.org/mediastore_new/IEEE/content/media/6287639/8274985/8404027/hassa.t2-2853125-large.gif) It can be seen that sensitivity is close to 1 for most patients. The poor performance of Patient 16 is beacuse 4 out of 5 test seizures have duration < $10s$. Comparing our performance to Shoeb and Guttag, we have a sensitivity of $87$% as opposed to $96$% and a median false alarm rate of $4/24hr$ instead of $2/24hr$. But, Shoeb et al. has also used spatial and non-EEG features in their algorithm. - ### KU Leuven Dataset We use a $15s$ second smoothing here instead of $10s$ to combat the high false alarm rate, but sensitivity falls from $.93$ to $.88$ . ![](https://ieeexplore.ieee.org/mediastore_new/IEEE/content/media/6287639/8274985/8404027/hassa.t3-2853125-large.gif) If we exclude Patient 21 from the analysis, the false alarm rate drops to $0.54/hr$. This time we get performance comparable to $Hunyadi$ et al. Our algorithm shows higher sensitivity at the cost of a slightly high false alarm rate. The box-plots of sensitivity, specificity and false alarms per hour values of CHB-MIT and KU Leuven datasets is shown below: ![](https://ieeexplore.ieee.org/mediastore_new/IEEE/content/media/6287639/8274985/8404027/hassa5-2853125-large.gif) The values of sensitivity and specificity are generally close to 1 while the occurrence of false alarms per hour is generally low. ### Conclusion We have observed that the proposed approach achieves good sensitivity by using the features primarily derived from the spatio-temporal modes of DMD. Considering the noisy nature of scalp EEG, the proposed algorithm is robust and indicates its utility for miscellaneous applications, such as brain-computer interfaces, P-300 oddball paradigm among others. Further improvement in false alarm rate could be achieved if auxiliary features such as mean, variance, skewness, etc., are included. Another future direction could be the use of recently developed multi-resolution dynamic mode decomposition to further reduce the false alarm rate. --- ### Honour code of Group - 12: I shall be honest in my efforts and will make my parents proud. - Yash Yelmame 2020A7PS1224G I shall be honest in my efforts and will make my parents proud. - Stivon Bright 2020A7PS1705G I shall be honest in my efforts and will make my parents proud. - Aneesha Jain 2019B4A70071G