# Notes on "Self-Supervised Training Enhances Online Continual Learning" ###### tags: `continual learning` `self-supervised learning` ### Author [Rishika Bhagwatkar](https://https://github.com/rishika2110) ## Introduction According to them, self-supervised pre-training would be more effective, especially when less data is used, than supervise pre-training which is the case with most of the class incremental models. This is because in supervised learning, the model learns to discriminate amongst the available classes and hance would not produce optimal representations for unseen classes. ## Related Work ### Continual Learning * Catastrophic Forgetting: It occurs when the model overwrites the representations of previously seen data upon receiving new data. * Catastrophic forgetting can be mitigated by: * Increasing model's capacity * Regularisation mechanisms * Replay mechanisms * Replay mechanisms have shown great results on large-scale continual learning of IMageNet. ### Self-Supervised Learning Self-supervised learning methods use pretext tasks to learn visual features, where the network provides its own supervision during training. * MoCo: * There are 2 different DNNs for comparison of instances. * One of them is updated by the mean of weights of the other. * Few representations are stored in buffer to be used as negative examples. * Simple framework for Contrastive Learning of visual Representation (SimCLR): * It has only 1 DNN and uses a large batch size to replace buffer. * Data augmentation techniques are used and it performs a non-linear transformation of feature representations before computing the contrastive loss. * MoCo-V2: * A projection head and more data augmentation techniques are incorporated in MoCo. * SwAV: * A cluster assignment prediction is done instead of comparing features of instances directly. It is shown that self-supervision surpasses supervision when less labelled data is available during pre-training. ## Algorithms Online continual learning systems that use either self-supervised or supervised pretraining as a function of the size of the pre-training dataset are compared. ### Pre-Training Approaches Following pre-training algorithms are studied: * Supervised- Same protocol as the state-of-the-art REMIND system for pre-training. * MoCo-V2 * SwAV ### Online Continual Learning Models Following online continual learning models that use pre-trained features were evaluated: * Deep SLDA: They use the model with a plastic covariance matrix and shrinkage of 1e-4. * Online Softmax with Replay: They created an online softmax classifier for continual learning which uses replay. * REMIND ## Experiments Experimentation was carried on ImageNet ILSVRC-2012 which is a standard benchmark for assessing the continual learning model's ability to scale. Additionally, experiments were done for offline linear evaluation and continual learning on Places-365 dataset to study how well the model is able to perform domain transfer once pre-trained on ImagNet. ## Discussion and Conclusion <!-- * SwAV outperformed sueprvised -->