# DEEP CONVOLUTIONAL NEURAL NETWORKS FOR INTERPRETABLE ANALYSIS OF EEG SLEEP STAGE SCORING(2017) >[source](https://arxiv.org/pdf/1710.00633.pdf) - ### Abstract - Uses a multitaper spectral analysis to create visually interpretebale images of EEG sleep pattern as inputs for Deep convolutional network. - Transfer learning is applied to classify sleep stages in unseen pateients - ### Sleep Stage scoring - Sleep quantification based on EEG alone is increasing due to ease of availability of EEG - AASM classifies EEG into 5 stages(some papers combine these into 3 stages for better classification) - W (wakefulness): alpha (8-12 Hz) rhythm is present. - N1 (Non-REM 1): alpha (8-12 Hz) rhythm is attenuated and replaced by mixed frequency theta signal (4-7 Hz), decrease in muscle tone and slow eye movements. - N2 (Non-REM 2): presents K-complexes (negative peak followed by a positive complex and a final negative voltage) in the <1.5 Hz range and sleep spindles (burst of oscillatory waves) at sigma (12-15 Hz) band. - N3 (Non-REM 3): slow wave activity exists (0.5-3 Hz), eye movements are unusual. - R (REM): relatively low-amplitude and mixedfrequency activity in EEG. - Timefrequency spectrogram images are created from windowed EEG signals and fed to a CNN pre-trained on a visual object recognition task, allowing the use of this powerful model for sleep stage classification in EEG data. - A challenge in EEG models is cross validation as random sampling ignores the strong dependance between EEG data from same patient or within a small timeframe.Hence proper cross validation is necessary - This paper used multitaper spectral estimation to generate colour image spectrograms which is fed to the CNN for sleep stages. - T decide the sleep stages the time frequencies of interest are fed to the model. - ### Transfer Learning with CNN - To avoid overfitting most models use large datasets which isn't possible with EEG data.Hence transfer learning is applied. - According to studies the lower layers learn more general features and higher layers capture the domain specific representations. - ### Image creation - EEG data is based as time frequency images.Spectral estimation is based on Fourier analysis but assumptions for it arent satisfied, so spectogram is highly biased. - This can be tackled by convolving the raw signal with a window fuction(taper). - The drawback of spectogram is it estimates with high variance across all frequencies which is increased by using tapers. - Thus Multitaper spectral estimation is used which basically applies mutiple tapers to the raw signal and averages their result. - The important hyperparameters are: - Window size ω (in seconds). - Window stepsize σ (in seconds). - Minimun frequency resolution f (in Hz). - Time bandwidth product W = ωf /2. - Number of tapers ,set as L = [2W] − 1, where [x] is G.I.F . - After Multitaper spectogram it is converted into logarithmic scale x = log(x) + 1 and split into equal bins of size s called epochs - This epoched spectogram is converted to RGB color matriz by using a set colourmap. - ### Architecture - The images are trained on VGGNet[10] with 16 weighted layers - [ccm<sub>64</sub>ccm<sub>128</sub>cccm<sub>256</sub>cccm<sub>512</sub>cccm<sub>512</sub>fcr<sub>4096</sub>fcr<sub>4096</sub>fcs<sub>1000</sub>] - c is a 3 × 3 convolutional filter of stride 1 using a ReLU activation function. - m stands for 2 × 2 maxpooling layer with a stride of 2. - fcr and fcs correspond to fully-connected layers with ReLU and soft-max activations, respectively. - sub-indexed values represent the number of channels in each block. - Transfer learning is employed by using weight values in all convolutional layers that have been previously trained on ILSVRC-2014 data, - Fully-connected layers are initialised from scratch using Xavier’s initialisation and trained using dropout. The number of final outputs is set according to the task we are tackling. - ### Network Visualisation - This paper uses sensitivity analysis as a visualisation tool to understand the network. - y, let D = {xn, tn}<sup>N</sup><sub>n=1</sub> be a dataset of P-dimensional input vectors x ( spectral images) and corresponding class labels t ∈ {1, . . . , C}, the built ANN acts as a function approximator, such that tˆ = f(x). - We can estimate the relative importance that our network places to every input feature j (i.e., RGB colour channel in a pixel) to discriminate among the existing classes as: - ![](https://i.imgur.com/quvmgoF.png) - where Ł is the loss function of choice and |x| is the absolute value of x. Sensitivity maps are created by disposing s<sup>^(j)</sup> in the corresponding RGB colour matrix forming an image. - The fact that most of the current frameworks supplying ANN building capabilities are provided with automatic differentiation procedures reduces the calculation of sensitivity maps to a simple function call. - ### Evaluation - Data used is EEG sleep recordings from the Sleep-EDF Database in the PhysioNet repository. - In particular, a subset of data from a study of age effects on sleep in healthy subjects, containing two whole-night EEG recordings from Fpz-Cz and Pz-Oz channels sampled at 100 Hz and corresponding hypnograms (expert annotations of sleep stages) from 20 subjects (10 males and 10 females) between 25-34 years old. - Sleeping time was retrieved from each recording as the interval between annotated lights off and lights on times or from 15 minutes before/after the first/last scored sleep epoch, if these annotations were not provided. Class labels were obtained from the hypnograms at every 30 s. - Images were created for Fpz-Cz sensor, setting ω= 3.0 s, f = 2 Hz, W = 3 and L = 5 tapers, with the purpose to capture the sleeping dynamics at the microevent time scale while maintaining a somewhat fine resolution. - The window stepsize was set to σ = 0.67 s in order to match the final image resolution (prefixed to 224 X 224 pixels by the pre-trained VGGNet). - Bin size was set to s = 150 s., corresponding to the current epoch plus the two previous and two posterior epochs, as it has been shown to improve overall accuracy by better classifying N1-N2, N1-R and N2-R transition stages. - Spectrogram log values were thresholded to the [0, 1] interval before applying the ‘Jet’ colourmap to generate the images. - Train test split was done by selecting 15 patients for train and rest for validation. - The models uses SGD and dropout to improve the accuracy. - CNNs were trained by optimising the categorical crossentropy between predicted values and class labels using adam SGD on mini batches of 250 training examples with a learning rate of 10<sup>-5</sup>, and decay rate of first and second moments set to 0.9 and 0.999, respectively. - The validation set was employed to choose the hyper-parameters and its loss as a stopping criterion to avoid overfitting. - ### Results - They use 2 scenarios: - VGGFE where VGGNet is used as feature extractor and all conv layers are fixed and only 3 fc layers are trained from scratch( Convergence time is much more than VGGFT). - VGG FT where all weights are updated and to get fine tuned network. - The results obtained show sleep stage classification at N1 stage isnt proper due to epochs assigned. - This happens as N1-R stage misses important information. - ### Conclusion - Classification of sleep stages can be effectively framed as a visual task by first creating natural colour like images using multitaper spectral estimation and then applying recent achievements in the object recognition field to obtain state-of-the-art classification accuracy. - Further improvement of the method includes better hyperparameter optimisation when generating the spectral images.