# Convolution- and Attention-Based Neural Network for Automated Sleep Stage Classification (2020) >[source](https://www.mdpi.com/1660-4601/17/11/4152) - ### Abstract - Attention is reuired to specific signal chracteristics(K-complexes and spindles) while scoring sleep epochs. - A CNN model with attention mechanism is devised to perform automatic sleep staging. - The CNN learns local signal chracteristics while attention mechanism learns inter and intra-epoch features - ### Model - A neural network model based on aCNNand an attention mechanism for automated sleep stage classification, using a single-channel raw EEG signal is used - A neural network based on convolution and attention mechanism is built. The network uses a CNN to extract local signal features and multilayer attention networks to learn intra- and inter-epoch features. - For the unbalanced dataset, the proposed method uses a weighted loss function during training to improve model performance on minority classes. - The model outperforms other methods on sleep-edf and sleep-edfx datasets utilizing various training and testing set partitioning methods without changing the model’s structure or any of its parameters. - Dataset used are sleep edf and sleep edfx available on PhysioNet. - There are eight sleep records in the sleep-edf database, four from healthy subjects and four from subjects with sleep disorders. Sleep-edfx contains 197 records of 61 healthy individuals and 20 individuals with sleep disorders. - Preprocessing: - Results were compared using complete sleep-edf database and on the first 20 healthy individual records (subjects 0–19) from the sleep-edfx database. - For each record in the sleep-edfx dataset, 30 min of wake stage data were retrained from before the first sleep epoch and from after the final sleep epoch. - The model used the Fpz-Cz channel as input. - Due to differences between individuals, collection equipment, and environments, the resulting data distributions also have distinct differences that make the model diffcult to train. - In order to make the training more stable, z-score normalization on the data from each individual was performed. - ![](https://i.imgur.com/4TiY5pp.png) - There are two types of data partitioning methos for clinical data - subject wise or record wise (called as epoch wise in the paper)also known as independent and non-independent methods. - The paper uses epoch wise mthod on sleep edf and subject wise on sleep edfx dataset. - Epoch wise method ,dataset is shuffled before partitioning. - Subject wise method partitions based on subjects. - ![](https://i.imgur.com/cBE8LgA.png) - ### Architecture - Model is divided into 3 components: - Window feature learning - Intra-epoch feature learning - Inter-epoch feature learning. - The model inputs multiple signal windows to window feature learning in parallel.It uses CNN to construct feature vectors for each window. - The intra epoch learning uses self attention mechanism to obtain weight of each signal window on an epoch and then does weighted addition to get epoch feature. - Window feature is updated using feed forward layer. - Inter epoch feature also uses self attention to learn temporal dependency between current and adjacent epochs. - Window length used is 200 with overlap length as 100 to have 29 windows per epoch. - ![](https://i.imgur.com/WWFzREm.png) ![](https://i.imgur.com/Cc6CIXx.png) - Window feature model has 5 convolutional blocks and a global average pooling layer as per diagram. - Intra and inter epoch feature have poitional embedding , two attention blocks and one GAP layer. - The difference is in inputs (29,256) for intra epoch learning and (3,256) for inter epoch. - After the previous three components, we finally obtained the feature vector of the current epoch with shape (1, 256). - The model uses two fully connected layers as the classifier and will output each stage class probability of the current epoch. - The first fully connected layer contains the ReLU and dropout layers. - The second fully connected layer connects to the softmax layer, which normalizes the output probability. - ![](https://i.imgur.com/W07TfAA.png) - ### Training and Testing - Weighted cross entropy loss is used in training to reduce impact of unbalanced data. - ![](https://i.imgur.com/qeBQAwC.png) - Weight βi corresponds to real category yi. - Adam optimizer is used with LookAhead mechanis, with initial learning rate as 1e-4,rate decay was 2e-4 and gradient clip value was 0.1. - Testing: - Different partioning was done on different datasets. - Sleep edf was 70% training 30% testing epoch wise partition. - It was trained with 100 epochs(iterations not sleep epoch) - Sleep edfx used subject wise partioning with 19 subjects as training and 1 as testing. - It was done 20 times for evaluation on the entire dataset.Each training had only 35 epochs due to large dataset. - The early stopping strategy was not used during training. - Ensemble method was used to improve stability. - The principle underlying this method is that ensemble outputs are obtained by using multiple models to infer the same input to get a final output,where Pi (Xt) is the stage probability vector of model i for the input at time t, and yt is the final output stage. - ![](https://i.imgur.com/DRTwhC9.png) - The parameters of the last five epochs of the model are saved during training to obtain multiple models. - Evaluation: - To evaluate the performance it was done per category and overall. - For each category,calculated papams were precision, recall, and F1-score of the model. - For the overall evaluation,accuracy is used to obtain an intuitive understanding of the model’s performance on the entire dataset. - However, because the distribution of each stage in the dataset is uneven, overall accuracy cannot reflect the model’s true performance. - To better reflect the model’s performance on imbalanced datasets, the macro average F1-score (MF1)is used(with C =5) (MF1) to evaluate it. - ![](https://i.imgur.com/J7m7KOw.png) - ### Results and Analysis - The classification of N1 accuracy is poor due to small number of samples. - When removing window feature learning, the raw window signal was directly used as input to the intra-epoch attention module. - When removing the intra- or inter-epoch attention module, the output of the previous module was directly connected to the subsequent GAP layer. - Taking the full model as the baseline, the removal of any component will reduce the model’s MF1 metric. - The removal of the window feature caused the greatest decline in performance. - The removal of weighted loss didnt affect accuracy but did reduce the MF1 score. - ### Conclusion - A convolution and attention based network using single EEG channel was used to realise automated sleep stage classification. - CNN is used as feature extractor and attetion replaces use of RNN. - Weighted loss function was very important in this architecture.