# Progression Regularization Explained The goal of the task is to predict a discrete outcome $y$, where $y$ encodes severity of some condition(for example knee OA severity) from some input $x$. Lets say $x_{1}$ and $x_{2}$ are two inputs with respective outcomes $y_{1}$ and $y_{2}$. We want to learn $\phi$ that maps $X\to R$ space and $h(r)$, a linear mapping with an activation to predict$y\in Y$. The conventional loss function for doing this is to minimise the following loss. $$loss=CE\left(h(\phi(x_{1})),y_{1}\right)+CE\left(h(\phi(x_{2})),y_{2}\right)$$ By minimising this loss we can learn the functions $\phi,h$ . But we can do better. since y encodes severity. Additionally we can add a regularisation term Reg $$Reg=-\lambda(\left|y_{1}-y_{2}\right|-\boldsymbol{1}\{y_{1}=y_{2}\})\left\Vert \phi(x_{1})-\phi(x_{2})\right\Vert ^{2}$$ $$Reg=min(m,\hat{y}_1-\hat{y}_2)$$ The reg term will be minimised. So, if $x_{1}$ and $x_{2}$ are of same severity i.e,$ y_{1}=y_{2}$, then $\phi(x_{1})$ is forced to be close to $\phi(x_{2})$, whereas if $x_{1}$ and $x_{2}$ have outcomes of different severity, then we force $\phi(x_{1})$ and $\phi(x_{2})$ apart with a weight, which increase with increase in difference in severity. $$loss=CE\left(h(\phi(x_{1})),y_{1}\right)+CE\left(h(\phi(x_{2})),y_{2}\right)+Reg$$ Essentially the goal here is to force $\phi(x)$ to be better representative of the severity of the outcome.