# SalientSleepNet(2021) >[source](https://arxiv.org/pdf/2105.13864.pdf) - ### Abstract - The model proposed in the paper uses an U<sup>2</sup>structure stream to detect salientwaves.This is inspired by a similar structure used in computer vision. - A multiscale extraction modeule is used to capture sleep transition rules. It is made up of multiple dilated convolutions with different receptive fields for capturing multi-scale rules. - A multimodal attention module is used to capture information from different multimodal data. - It has lesser parameters compared to existing deep neural networks for sleep staging. - ### Model - It uses U<sup>2</sup> structure to capture Salient waves in EEG and EOG modalities. - Multi-scale extraction module to learn multi-scale sleep transition rules. - Multimodal attention module to fuse attributes from EEG and EOG streams. - A segment-wise classifier along with a bottleneck layer to reduce computational cost. - ![](https://i.imgur.com/XAtbWX6.png) - **Two Stream U<sup>2</sup> structure**: - Sleep stages are classified using salient waves in EEG and EOG. - Current methods convert the raw signals into time0frequency images which may cause information loss. - A 2 stream U<sup>2</sup> structure is used for EEG and EOG signals to learn distinctive features. - Each structure is an encoder decoder structure with multiple nested U units. - The residual connection reduces degradation problem in deep networks. - The encoder has 5 U-unit and decoder has 4 U-unit. - **Multiscale extraction module(MSE):** - ![](https://i.imgur.com/9WoFq3h.png) - To learn sleep transition rules effective MSE used dilated convolutions with dofferent dilation rates. - This model uses 4 dilated convolutions from same input feature map. - The results of diluted convolutions are concatenated and passed to bottleneck layer. - Bottleneck layer reducles channels such a way C<sub>out</sub> = C<sub>in</sub>/downsampling rate. - **Multimodal Attention Module(MMA):** - ![](https://i.imgur.com/z9bkWlz.png) - Different modalities have distinctive features which contribute to classifying sleep stages which arent accounted for in existing models. - The MMA component fuses the feature maps from 2 streams and channel wise attention in implicit manner. - The fusion method is as: - ![](https://i.imgur.com/q2dmw6x.png) - This is fed into a channel wise attention block and mutliplied elementwise with X<sub>fuse</sub>. - Segment-wise classfication: - The salient object detection models in cv are pixel wise classifiers which cannot be directly applied to physiological signals. - Hence a segment-wise classifier which maps pixel-wise feature maps to segment wise predict labels sequence. - ![](https://i.imgur.com/w5Xssf7.png) - ### Evaluation - The sleep edf dataset is used for evaluation. - N1 accuracy(F1 score) is better than other base-line models by 4%(56.2 others are 52ish). - A few mized models can achieve better accuracy but are difficult to tune and optimize and do not use salient wave features.