# SalientSleepNet(2021)
>[source](https://arxiv.org/pdf/2105.13864.pdf)
- ### Abstract
- The model proposed in the paper uses an U<sup>2</sup>structure stream to detect salientwaves.This is inspired by a similar structure used in computer vision.
- A multiscale extraction modeule is used to capture sleep transition rules. It is made up of multiple dilated convolutions with different receptive fields for capturing multi-scale rules.
- A multimodal attention module is used to capture information from different multimodal data.
- It has lesser parameters compared to existing deep neural networks for sleep staging.
- ### Model
- It uses U<sup>2</sup> structure to capture Salient waves in EEG and EOG modalities.
- Multi-scale extraction module to learn multi-scale sleep transition rules.
- Multimodal attention module to fuse attributes from EEG and EOG streams.
- A segment-wise classifier along with a bottleneck layer to reduce computational cost.
- 
- **Two Stream U<sup>2</sup> structure**:
- Sleep stages are classified using salient waves in EEG and EOG.
- Current methods convert the raw signals into time0frequency images which may cause information loss.
- A 2 stream U<sup>2</sup> structure is used for EEG and EOG signals to learn distinctive features.
- Each structure is an encoder decoder structure with multiple nested U units.
- The residual connection reduces degradation problem in deep networks.
- The encoder has 5 U-unit and decoder has 4 U-unit.
- **Multiscale extraction module(MSE):**
- 
- To learn sleep transition rules effective MSE used dilated convolutions with dofferent dilation rates.
- This model uses 4 dilated convolutions from same input feature map.
- The results of diluted convolutions are concatenated and passed to bottleneck layer.
- Bottleneck layer reducles channels such a way C<sub>out</sub> = C<sub>in</sub>/downsampling rate.
- **Multimodal Attention Module(MMA):**
- 
- Different modalities have distinctive features which contribute to classifying sleep stages which arent accounted for in existing models.
- The MMA component fuses the feature maps from 2 streams and channel wise attention in implicit manner.
- The fusion method is as:
- 
- This is fed into a channel wise attention block and mutliplied elementwise with X<sub>fuse</sub>.
- Segment-wise classfication:
- The salient object detection models in cv are pixel wise classifiers which cannot be directly applied to physiological signals.
- Hence a segment-wise classifier which maps pixel-wise feature maps to segment wise predict labels sequence.
- 
- ### Evaluation
- The sleep edf dataset is used for evaluation.
- N1 accuracy(F1 score) is better than other base-line models by 4%(56.2 others are 52ish).
- A few mized models can achieve better accuracy but are difficult to tune and optimize and do not use salient wave features.