# Automatic Sleep Stage Classification Using Single-Channel EEG: Learning Sequential Features with Attention-Based Recurrent Neural Networks Notes Paper Link: [kar.kent Link](https://kar.kent.ac.uk/72660/1/Phan2018d.pdf) ## Introduction The authors propose using deep bi-directional RNNs with an attention layer to classify single channel EEG data. The authors treat this Neural Network as a feature extractor and use these feature vectors as inputs for Linear SVM which classifies the sleep stage. The authors also propose discriminative method to learn a filter bank with a DNN for preprocessing purpose. Filtering the frame-wise feature vectors with the learned filter bank beforehand leads to further improvement on the classification performance. The proposed approach demonstrates good performance on the Sleep-EDF dataset. ## Pre-Processing A 30-second EEG epoch is firstly decomposed into small frames and transformed into a sequence of frame-wise feature vectors. Logpower spectral coefficients are used for frame-wise representation. They decompose the EEG signal into interleaved frames of two seconds long with an overlap of 50%. This results in T = 29 such frames in total. Each frame is then transformed into frequency domain via 256-point discrete Fourier transform (DFT) with Hamming window, followed by logarithm scaling to obtain a log-power feature vector of size F = 129. Afterwards, dimension reduction and frequency smoothing are performed by filtering the log-power feature vectors with a frequency-domain triangular filter bank with M = 20 filters. The filters are equally spaced with an overlap of 50%. As an alternative to the regular triangular filter bank used for preprocessing in Section II-A, they train a tailored DNN to learn a filter bank discriminatively for this purpose. The DNN architecture proposed for filter-bank learning consists of one filter-bank layer, three fully-connected (FC) layers, and one softmax layer. The filter-bank layer is actually a fully-connected layer which are enforced various constraints for filter-bank learning purpose. The FCs use ReLU activation function. \\(W_{fb} = f_{+}(W)\odot S \\) Here, \\(W_{fb} \\) is the filter-bank layer weight matrix learned by DNNs, \\( f_{+} \\) is the function to make \\(W \\) non-negative, the authors have used sigmoid, and \\(S \\) is a linear-frequency triangular filter bank matrix. ## Neural Network Architecture The architecture is pretty straightforward, the authors use two layers of bi-directional GRUs on the initial sequence data obtained in pre-processing, and the data is passed through an "attention" layer. The authors use GRUs as they are a lot lighter in computation in comparison to LSTMs. The attention layer is simply element wise multiplication of the output vector with an attention vector. The attention vector helps decide the relevance of each element of output in the creation of the feature vector. The final feature vector generated from this is passed through a softmax layer for sleep stage classification. After training, the softmax layer is removed and the feature vector obtained after attention mechanism is passed through a Linear SVM for classification. In general, SVM usually leads to better generalization in comparison to the softmax, thanks to its maximum margin property [1]. The feature vectors extracted for the training examples are used to train the SVM classifier which is subsequently employed to classify those feature vectors extracted for the test examples. ## Loss Function and Optimizers The authors use cross entropy loss for training their Neural Network. L2 regularisation is also used on all network parameters as a regularization measure. Additionally, dropout is used on GRUs for further regularisation. The optimizer used for training was the Adam Optimizer. ## Hyperparameters #### Attention based Deep RNN | Parameter | Value | |-------|-------| |The number of layers L | 2| |Size of hidden state vector |256| |Size of the attention weights| 96| |Dropout rate |0.2| |Regularization parameter | 10<sup>−4</sup>| #### Filter Bank Learning DNN |Layer| Size| Dropout| |------|-------|-----| |FC 1 |512| 0.2 |FC 2| 256 |0.2| |FC 3| 512| 0.2| ## Other References 1. B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proc. COLT, 1992, pp. 144–152