# DEEP CONVOLUTIONAL NEURAL NETWORKS FOR INTERPRETABLE ANALYSIS OF EEG SLEEP STAGE SCORING(2017)
>[source](https://arxiv.org/pdf/1710.00633.pdf)
- ### Abstract
- Uses a multitaper spectral analysis to create visually interpretebale images of EEG sleep pattern as inputs for Deep convolutional network.
- Transfer learning is applied to classify sleep stages in unseen pateients
- ### Sleep Stage scoring
- Sleep quantification based on EEG alone is increasing due to ease of availability of EEG
- AASM classifies EEG into 5 stages(some papers combine these into 3 stages for better classification)
- W (wakefulness): alpha (8-12 Hz) rhythm is present.
- N1 (Non-REM 1): alpha (8-12 Hz) rhythm is attenuated and replaced by mixed frequency theta signal (4-7 Hz), decrease in muscle tone and slow eye movements.
- N2 (Non-REM 2): presents K-complexes (negative peak followed by a positive complex and a final negative voltage) in the <1.5 Hz range and sleep spindles (burst of oscillatory waves) at sigma (12-15 Hz) band.
- N3 (Non-REM 3): slow wave activity exists (0.5-3 Hz), eye movements are unusual.
- R (REM): relatively low-amplitude and mixedfrequency activity in EEG.
- Timefrequency spectrogram images are created from windowed EEG signals and fed to a CNN pre-trained on a visual object recognition task, allowing the use of this powerful model for sleep stage classification in EEG data.
- A challenge in EEG models is cross validation as random sampling ignores the strong dependance between EEG data from same patient or within a small timeframe.Hence proper cross validation is necessary
- This paper used multitaper spectral estimation to generate colour image spectrograms which is fed to the CNN for sleep stages.
- T decide the sleep stages the time frequencies of interest are fed to the model.
- ### Transfer Learning with CNN
- To avoid overfitting most models use large datasets which isn't possible with EEG data.Hence transfer learning is applied.
- According to studies the lower layers learn more general features and higher layers capture the domain specific representations.
- ### Image creation
- EEG data is based as time frequency images.Spectral estimation is based on Fourier analysis but assumptions for it arent satisfied, so spectogram is highly biased.
- This can be tackled by convolving the raw signal with a window fuction(taper).
- The drawback of spectogram is it estimates with high variance across all frequencies which is increased by using tapers.
- Thus Multitaper spectral estimation is used which basically applies mutiple tapers to the raw signal and averages their result.
- The important hyperparameters are:
- Window size ω (in seconds).
- Window stepsize σ (in seconds).
- Minimun frequency resolution f (in Hz).
- Time bandwidth product W = ωf /2.
- Number of tapers ,set as L = [2W] − 1, where [x] is G.I.F .
- After Multitaper spectogram it is converted into logarithmic scale x = log(x) + 1 and split into equal bins of size s called epochs
- This epoched spectogram is converted to RGB color matriz by using a set colourmap.
- ### Architecture
- The images are trained on VGGNet[10] with 16 weighted layers
- [ccm<sub>64</sub>ccm<sub>128</sub>cccm<sub>256</sub>cccm<sub>512</sub>cccm<sub>512</sub>fcr<sub>4096</sub>fcr<sub>4096</sub>fcs<sub>1000</sub>]
- c is a 3 × 3 convolutional filter of stride 1 using a ReLU activation function.
- m stands for 2 × 2 maxpooling layer with a stride of 2.
- fcr and fcs correspond to fully-connected layers with ReLU and soft-max activations, respectively.
- sub-indexed values represent the number of channels in each block.
- Transfer learning is employed by using weight values in all convolutional layers that have been previously trained on ILSVRC-2014 data,
- Fully-connected layers are initialised from scratch using Xavier’s initialisation and trained using dropout. The number of final outputs is set according to the task we are tackling.
- ### Network Visualisation
- This paper uses sensitivity analysis as a visualisation tool to understand the network.
- y, let D = {xn, tn}<sup>N</sup><sub>n=1</sub> be a dataset of P-dimensional input vectors x ( spectral images) and corresponding class labels t ∈ {1, . . . , C}, the built ANN acts as a function approximator, such that tˆ = f(x).
- We can estimate the relative importance that our network places to every input feature j (i.e., RGB colour channel in a pixel) to discriminate among the existing classes as:
- 
- where Ł is the loss function of choice and |x| is the absolute value of x. Sensitivity maps are created by disposing s<sup>^(j)</sup> in the corresponding RGB colour matrix forming an image.
- The fact that most of the current frameworks supplying ANN building capabilities are provided with automatic differentiation procedures reduces the calculation of sensitivity maps to a simple function call.
- ### Evaluation
- Data used is EEG sleep recordings from the Sleep-EDF Database in the PhysioNet repository.
- In particular, a subset of data from a study of age effects on sleep in healthy subjects, containing two whole-night EEG recordings from Fpz-Cz and Pz-Oz channels sampled at 100 Hz and corresponding hypnograms (expert annotations of sleep stages) from 20 subjects (10 males and 10 females) between 25-34 years old.
- Sleeping time was retrieved from each recording as the interval between annotated lights off and lights on times or from 15 minutes before/after the first/last scored sleep epoch, if these annotations were not provided. Class labels were obtained from the hypnograms at every 30 s.
- Images were created for Fpz-Cz sensor, setting ω= 3.0 s, f = 2 Hz, W = 3 and L = 5 tapers, with the purpose to capture the sleeping dynamics at the microevent time scale while maintaining a somewhat fine resolution.
- The window stepsize was set to σ = 0.67 s in order to match the final image resolution (prefixed to 224 X 224 pixels by the pre-trained VGGNet).
- Bin size was set to s = 150 s., corresponding to the current epoch plus the two previous and two posterior epochs, as it has been shown to improve overall accuracy by better classifying N1-N2, N1-R and N2-R transition stages.
- Spectrogram log values were thresholded to the [0, 1] interval before applying the ‘Jet’ colourmap to generate the images.
- Train test split was done by selecting 15 patients for train and rest for validation.
- The models uses SGD and dropout to improve the accuracy.
- CNNs were trained by optimising the categorical crossentropy between predicted values and class labels using adam SGD on mini batches of 250 training examples with a learning rate of 10<sup>-5</sup>, and decay rate of first and second moments set to 0.9 and 0.999, respectively.
- The validation set was employed to choose the hyper-parameters and its loss as a stopping criterion to avoid overfitting.
- ### Results
- They use 2 scenarios:
- VGGFE where VGGNet is used as feature extractor and all conv layers are fixed and only 3 fc layers are trained from scratch( Convergence time is much more than VGGFT).
- VGG FT where all weights are updated and to get fine tuned network.
- The results obtained show sleep stage classification at N1 stage isnt proper due to epochs assigned.
- This happens as N1-R stage misses important information.
- ### Conclusion
- Classification of sleep stages can be effectively framed as a visual task by first creating natural colour like images using multitaper spectral estimation and then applying recent achievements in the object recognition field to obtain state-of-the-art classification accuracy.
- Further improvement of the method includes better hyperparameter optimisation when generating the spectral images.