# DeepSleepNet:a Model for Automatic Sleep Stage Scoring based on Raw Single-Channel EEG(2017) >[source](https://arxiv.org/pdf/1703.04046.pdf) - ### Introduction - The paper proposes a DL model named DeepSleepNet, for automatic sleep stage scoring based on raw single-channel EEG without any hand engineered methods used on Raw EEG. - Only few methods encode the temporal information required for identifying next sleep stages into extracted features. - This model uses CNN with bidirectinal LSTM to learn transition rules automatically from EEG epochs. - The model utilises 2 CNNs with different filter sie=zes at first layer and bidirectionsl LSTMs. - The CNNs learn to extract time invariant features from raw EEG. - The bidirectinal LSTMs are trained to encode temporal information such as transition rules into the model. - A 2 step training algorithm is used to train the model end to end while also taking care of class imbalance. - ### Model - **Architecture:** - ![](https://i.imgur.com/exmTEUz.png) - Representation Learning: - 2 CNNs with small and large filter sizes are used to extract time invariant features from raw EEG. - The small filter can better capture temporal information, the large filter can better capture frequency information. - CNN small uses filter size of Fs/2 with stride Fs/16. - CNN large uses 4Fs filter size with stride Fs/2. - Each CNN has 4 conv layers and 2 maxpool layers. - Each conv layer performs 3 operations: - 1-D convolution - Batchnorm - ReLU activation - The filter sizes,the number of filters and stride size are given in the diagram above. - These outputs from both CNNs are concatenated. - Sequence Residual Learning: - This part has 2 components. bidirectinal LSTMs and shortcut connection. - Bidirectional LSTMs learn temporal information(sleep transition rules). - The model uses peephole mechanism which inspects current memory cell before modidification. - Shortcut connection is used to add the temporal information it learned from previous input sequences into feature extracted by CNNs. - FC layer is used to transform the features into vectors that can be added to the LSTM output. - FC layer performs matrix multiplication, batchnorm and ReLU activation. - ### Training - A 2 step algorithm is used for training to effectively train the model and tackle the class imbalance problem. - The algorithm first pretrains the representation learning part and then fine tunes the entire model. - **Pretraining :** - The 2 CNNs are stacked with a softmax layer(not the end one as shown in diagram) to pretrain the CNNs. - These parameters(softmax) are discarded after pre-training. - The pre-model is trained with class balance training set using Adam optimizer. - The softmax layer is discarded after pretraining. - The class balance set is obtained by duplicating the minority sleep stages in the dataset such that all stages have same number of samples. - The lr,beta1,beta2 used are 10<sup>-4</sup>, 0.9 and 0.999. - **Fine-tuning :** - The CNNs use the params from the premodel and are trained with lr1 lower learning rate than the pre training. - The sequence learning part uses lr2 higher learning rate than pretraining. - This is because when a same learning rate was used the CNN parameters excessively adjusted to sequential data which wasnt class balanced leading to overfitting. - Heuristic gradient clippin is used to avoid exploding gradients. - A dropout of 0.5 was used to reduce overfitting(only while training), - L2 weight decay was also used with lambda 10<sup>-3</sup> - Lr1,lr2 are 10<sup>-6</sup> and 10<sup>-4</sup> respectively. - The mean and variance of the training set, which were used as fixed parameters during testing, were estimated by computing the moving average of with a decay rate of 0.999 from the sampling mean and variance of each mini-batch. - ### Evaluation - The datasets used were Sleep edf and MASS using independent method for train/test split. - A k-folde cross validation was used with k 31 and 20 for MASS and sleep respectively to evaluate the model. - In each fold N - (N/k) are used in training, N/k in testing where N is number os subjects. - This process is repeated k times so all recordings are tested. - Then predicted sleep stages are combined from all folds and computed. - The performance metrics used are per call precision(PR), per class recall(PE), F1 score, macro-averaging F1(MF1) score, overall accuracy. - ![](https://i.imgur.com/7mCpTMG.png) - The results show that N1 accuracy is a little lower compared to the rest with F1 less than 60 while others were arounf 80-90.