# Personalized Automatic Sleep Staging with Single-Night Data: a Pilot Study with KL-Divergence Regularization
Original Paper Link: [arxiv Link](https://arxiv.org/pdf/2004.11349.pdf)
# Introduction
Brain waves vary between people. A way to improve sleep scoring is to personalise a pre-trained SOTA model using one night labeled personal data of a person. It is achieved in this paper using KL Divergence.
The authors use a pre-trained SeqSleepNet and personalise it by fine-tuning according to the person's sleep wave characteristics.
This is done by adding the KL divergence between the output of the subject independent model and the output of the personalized model to the loss function during finetuning. In effect, KL-divergence regularization prevents the personalized model from overfitting to the single-night data and straying too far away from the subject independent model.
The authors achieve a personalized sleep staging accuracy of 79.6%, a Cohen’s kappa of 0.706, a macro F1-score of 73.0%, a sensitivity of 71.8%, and a specificity of 94.2%. The approach is robust against overfitting and it improves the accuracy by 4.5 percentage points compared to non-personalization and 2.2 percentage points compared to personalization without regularization.
# Info on SeqSleepNet
SeqSleepNet was trained on MASS dataset. KLDiv is used here to transfer on SleepEDF dataset subjects.
# Personalisation
The trained model's initial parameters are \\(\theta\\). After a single night's personalisation, they become \\(\theta^p\\). Channel mismatch is expected and it's expected that finetuning will address both channel mismatch and personalisation. Four finetuning strategies are investigated {All, EPB+Softmax, SPB+Softmax, Softmax}.
A previous study showed that sleep transfer learning requires roughly at least ten subjects’ data, leaving personalization with the single-night data of a target subject exposed to the substantial risk of overfitting.
To remedy overfitting, they propose to regularize the sequential classification loss function in SeqSleepNet with the KL divergence between the posterior probability outputs of the SI model \\(\theta\\) and the ones from the personalized model \\(\theta^p\\), which constrains the personalized model not to stray too far away from the SI model.
Given an input sequence \\((S_{1}, S_{2}, . . . , S_{L} )\\), KL divergence is:
\\(D_{KL} = \frac{1}{L} \sum_\text{l=1}^L \sum_{\text{c }\epsilon \text{ }C} P_{\theta}(\hat{y}_{l} = c)\log (\frac{P_{\theta}(\hat{y}_{l} = c)}{P_{\theta p}(\hat{y}_{l} = c)})\\)
The KL-divergence regularization is added into the sequential classification loss function to form the loss function for personalization:

where \\(\alpha \text{ } \epsilon \text{ } [0,1]\\)is the KL-divergence regularization coefficient, regulating how far the personalized model deviates from the SI model. When \\(\alpha = 0 \\), the KL-divergence regularization is cancelled out and the personalization turns out to be the same as regular finetuning.
In contrast, when \\(\alpha = 1 \\), we trust the pretrained SI model completely and ignore all the new information of the personalization data.
As a result, model personalization is equivalent to changing the target distribution from the unknown source-domain database (the MASS database used for pretraining) to a linear interpolation of the source-domain data distribution and the personalized data distribution.