# DeepSleepNet:a Model for Automatic Sleep Stage Scoring based on Raw Single-Channel EEG(2017)
>[source](https://arxiv.org/pdf/1703.04046.pdf)
- ### Introduction
- The paper proposes a DL model named DeepSleepNet, for automatic sleep stage scoring based on raw single-channel EEG without any hand engineered methods used on Raw EEG.
- Only few methods encode the temporal information required for identifying next sleep stages into extracted features.
- This model uses CNN with bidirectinal LSTM to learn transition rules automatically from EEG epochs.
- The model utilises 2 CNNs with different filter sie=zes at first layer and bidirectionsl LSTMs.
- The CNNs learn to extract time invariant features from raw EEG.
- The bidirectinal LSTMs are trained to encode temporal information such as transition rules into the model.
- A 2 step training algorithm is used to train the model end to end while also taking care of class imbalance.
- ### Model
- **Architecture:**
- 
- Representation Learning:
- 2 CNNs with small and large filter sizes are used to extract time invariant features from raw EEG.
- The small filter can better capture temporal information, the large filter can better capture frequency information.
- CNN small uses filter size of Fs/2 with stride Fs/16.
- CNN large uses 4Fs filter size with stride Fs/2.
- Each CNN has 4 conv layers and 2 maxpool layers.
- Each conv layer performs 3 operations:
- 1-D convolution
- Batchnorm
- ReLU activation
- The filter sizes,the number of filters and stride size are given in the diagram above.
- These outputs from both CNNs are concatenated.
- Sequence Residual Learning:
- This part has 2 components. bidirectinal LSTMs and shortcut connection.
- Bidirectional LSTMs learn temporal information(sleep transition rules).
- The model uses peephole mechanism which inspects current memory cell before modidification.
- Shortcut connection is used to add the temporal information it learned from previous input sequences into feature extracted by CNNs.
- FC layer is used to transform the features into vectors that can be added to the LSTM output.
- FC layer performs matrix multiplication, batchnorm and ReLU activation.
- ### Training
- A 2 step algorithm is used for training to effectively train the model and tackle the class imbalance problem.
- The algorithm first pretrains the representation learning part and then fine tunes the entire model.
- **Pretraining :**
- The 2 CNNs are stacked with a softmax layer(not the end one as shown in diagram) to pretrain the CNNs.
- These parameters(softmax) are discarded after pre-training.
- The pre-model is trained with class balance training set using Adam optimizer.
- The softmax layer is discarded after pretraining.
- The class balance set is obtained by duplicating the minority sleep stages in the dataset such that all stages have same number of samples.
- The lr,beta1,beta2 used are 10<sup>-4</sup>, 0.9 and 0.999.
- **Fine-tuning :**
- The CNNs use the params from the premodel and are trained with lr1 lower learning rate than the pre training.
- The sequence learning part uses lr2 higher learning rate than pretraining.
- This is because when a same learning rate was used the CNN parameters excessively adjusted to sequential data which wasnt class balanced leading to overfitting.
- Heuristic gradient clippin is used to avoid exploding gradients.
- A dropout of 0.5 was used to reduce overfitting(only while training),
- L2 weight decay was also used with lambda 10<sup>-3</sup>
- Lr1,lr2 are 10<sup>-6</sup> and 10<sup>-4</sup> respectively.
- The mean and variance of the training set, which were used as fixed parameters during testing, were estimated by computing the moving average of with a decay rate of 0.999 from the sampling mean and variance of each mini-batch.
- ### Evaluation
- The datasets used were Sleep edf and MASS using independent method for train/test split.
- A k-folde cross validation was used with k 31 and 20 for MASS and sleep respectively to evaluate the model.
- In each fold N - (N/k) are used in training, N/k in testing where N is number os subjects.
- This process is repeated k times so all recordings are tested.
- Then predicted sleep stages are combined from all folds and computed.
- The performance metrics used are per call precision(PR), per class recall(PE), F1 score, macro-averaging F1(MF1) score, overall accuracy.
- 
- The results show that N1 accuracy is a little lower compared to the rest with F1 less than 60 while others were arounf 80-90.