# TinySleepNet(2020)
>[source](https://pubmed.ncbi.nlm.nih.gov/33018069/)
- ### Abstract
- Most EEG based models are overengineered to hace many layers or additional steps in processing(converting EEG to spectogram based images).
- They also require training on large datasets to avoid overfitting
- This paper proposes a model TinySleepNet which uses a smaller DL network with raw single channel EEG as input.
- The model is a improved version of DeepSleepNet([notes](https://hackmd.io/cW_57wdUQ7-2dVyblDbrnQ)) which utilises data augmentation to overcome the overfitting problem.
- ### Model
- **Architecture:**
- 
- Representation Learning:
- The CNN has 4 conv layers with 2 maxpool and dropout layers.
- This extracts time-invariant features from raw EEG signals.
- Instead of 2 CNNs(DeepSleepNet )one is used as it has been studided a stack of conv layers have similar effect as one conv layer with large filter(References VGGNet).
- Sequence Learning:
- This part consists of a single unidirectional LSTM layer followed by a dropout.
- It learns the temporal information such as sleep transition rules.
- Unidirectional LSTMs are used instead of bidirectional(DeepSleepNet)to reduce computation.
- As many layers arent needed the residual connection(DeepSleepNet) to solve vanishing gradient problem is removed.
- Training:
- The model is trained end-to-end with minibatch descent and employs signal and sequence augmentation along with weighted cross entropy to solve class imbalance issues(weights in favor of N1).
- Signal augmentation :
- It shifts the EEG signal along the time axis.
- The duration is uniformly sampled from a range ± B<sub>sig</sub>% of the EEG epoch duration.
- This helps us synthesize new signal patterns for each training epoch.
- Sequence augmentation :
- Here the stating point of sequence of EEG epochs from each sleep is randomly chosen.
- The skipping of EEG epochs at the beginning is uniformly sampled from range 0 to B<sub>seq</sub>(0 for no skipping, B<sub>seq</sub> for maximum skipping).
- This helps generate new batches of multiple sequences of EEG epochs in minibatch gradient descent.
- In this manner training doesnt include to pretrain the network on oversampled class balanced dataset(DeepSleepNet).
- Adam optimizer was used for 200 epochs with lr,beta1 and beta2 as 10<sup>-4</sup>, 0.9 and 0.999.
- Minibatch size is 20 and sequence length is 15.
- L2 weight decay with lambda 10<sup>-3</sup> and gradient clipping with threshold 5 is also used.
- B<sub>seq</sub> = 5 and B<sub>sig</sub> = 10
- ### Evaluation
- Datasets used are MASS and Sleep-edf.
- K fold cross validation was used to evaluate the model preformance.
- Metrics calculated are Precision, recall, F1 score, macro-averageing F1 score, overall accuracy and cohen kappa coeffiecient.
- N1 score is lower than the other classes but is comparable to SOTA scores(50-60).