Progression Regularisation Applications

# Progression Regularisation Applications ## Longitudinal scans of each knee (Submitted) In this framework, $x_{i1},x_{i2}$ are scans (Radiography/MRI) of a knee of a patient i recorded over time ($x_{i1}$ recorded first and $x_{i2}$ later). The goal is to predict if the knee will undergo total knee replacement within 1 year. The knee $x_{i1}$ will always have negative outcome $y_{i1}=0$ (i.e, no TKR within 1 year of the 1st scan) and after $y_{i2}\in\{0,1\}$. The total loss with regularisation here is $$loss=\sum_{i} CE\left(h(\phi(x_{i1})),y_{i1}\right)+CE\left(h(\phi(x_{i2})),y_{i2}\right)-\lambda(2y_{i2}-1)\left\Vert \phi(x_{i1})-\phi(x_{i2})\right\Vert ^{2}$$ We observe reasonable improvements in terms of AUC on an external dataset ## Scans of different Knees ### TKR prediction We have 728 TKR case control matched knees (acquired at their baseline 00m visit) and there is one scan per knee in the dataset. In this framework, $x_{i},x_{j}$ are scans (Radiography/MRI) of different knees $i$ and $j$. The goal is to predict Total knee replacement within 9 years of the scan aquisition date. In this frame work, for a scan $x_{i}$ , either a positive pair (i.e $y_{j}=y_{i}$) or a negative pair $y_{j}\neq y_{i}$) from the training set and the progreg loss is applied between the two scans. $$loss=\sum_{i}CE\left(h(\phi(x_{i})),y_{i}\right)-\lambda(2|y_{i}-y_{j}|-1)\left\Vert \phi(x_{i})-\phi(x_{j})\right\Vert ^{2}$$ This is equivalent to doing siamese network training on top of cross entropy training. We dont observe any performance gain (AUC), using this approach. In this case, I think both the losses (CE, Siamese) force $\phi(x_{i})$ and $\phi(x_{j})$ apart , which could be the reason for no improvement. ### KL grade prediction We have around 9000 knees of patients acauired at their baseline visit in the study. The goal is to predict Kellegren lawrence grade of the scan labeled by radiologists $y\in\{0,1,2,3,4\}$. In this frame work, for a scan $x_{i}$ , either a positive pair (i.e $y_{j}=y_{i}$) or a negative pair $y_{j}\neq y_{i}$) from the training set and the progreg loss is applied between the two scans. $$loss=\sum_{i}CE\left(h(\phi(x_{i})),y_{i}\right)-\lambda(2|y_{i}-y_{j}|-1)\left\Vert \phi(x_{i})-\phi(x_{j})\right\Vert ^{2}$$ The weight of regularisation is $(2|y_{i}-y_{j}|-1)$ which increases with increase in difference of KL grades (severity) Again this is equivalent to doing siamese network training on top of cross entropy training. But here the siamese loss function expicitly forces the 5 classes to be away from each other with distance being dependent on difference in their severity. CE loss alone doesnt explicitly enforce this (as we ll still have lower loss if the classes are sepearted in some form not necessarily in accordance to their severity) ? (I think). So it could be interesting to apply this for the KLG prediction task