Survey Paper: Contrastive Learning

--- tags: Histopathology --- # Survey Paper: Contrastive Learning ## General Case Among many issues faced by deep learning, the scarcity of labeled data is one of the majors. The annotation of an immense amount of data needed by supervised learning is expensive. In some scenarios, it is even impossible for researchers to collect enough data from the specific domain. As a result, several self-supervised techniques have been developed in recent years. The main idea of self-supervised learning is quite straightforward: given that data contains much more information than their labels, deep neural networks can learn better directly from the data instead of the labels. In early stage of development of self-supervised learning, pseudo labels are generated in pretext tasks through rule-based approaches such as rotation [1], jigsaw puzzle [2], colorization [3], etc. However, the performance of self-supervised learning can hardly compete with that of supervised learning until the emergence of contrastive learning. The main goal of contrastive learning is to learn the representation of dataset, not the data points. For example, PixelCNN [4] encodes the all the information in an image pixel by pixel based on the chain rule to build the data likelihood. The discovery of the intrinsic quality of data can also be achieved by loss function. Reference [5] proposes a modified autoregressive loss function, InfoNCE $$ \mathcal{L}_N = - \underset{X}{\mathbb{E}}\bigg[log\frac{f_k(x_{t+k}, c_t)}{\Sigma_{x_j\in{X}}f_k(x_j, c_t)}\bigg] $$ of which optimization maximizes the mutual information between the context and the data. InfoNCE was first implemented for image classification in CPC [6], which cuts an image into overlapping patches and treats every other region as a negative sample for each patch. Following CPC, CMC [7] further shows that contrastive loss can also benefit the learning from multiple views. ![](https://i.imgur.com/WYP20vb.jpg) Fig. 1. SimCLR As a summary of these development, the milestone of contrastive learning, SimCLR [8], was proposed. As shown in Fig. 1, SimCLR applies different data augmentations on one image to generate a positive sample pair and treat any other images as negative samples. Similar to CMC, SimCLR aims to group representations of similar samples closer while diverse samples far from each other. It is noted that when computing the loss, non-linear mapping is utilized to avoid information loss. To this point, the performance of self-supervised learning excitingly competes that of supervised learning. After SimCLR, several contrastive models are inspired and improve in different aspects: MoCo [9], BYOL [10], SwAV [11], SimSiam [12], to name but a few. ## Applications on Histopathology For applications of deep learning on histopathology, the shortage of labelled data is especially severe. It requires pathologists to examine histological slides. Moreover, the annotation is considerably time-consuming. As a result, there are some attempts to apply contrastive learning on histopathology analysis, so that the model can learn from both the scarce labelled data and much more unlabelled data. Ciga et al. [13] experiment on using SimCLR [8] to pretrain a generalize feature extractor. By using multiple histopathology datasets with different organ types, staining style and resolution properties without any labels, the pretrained model perform better than ImageNet pretrained networks on downstream training on histopathology experiments. Furthermore, using more images for contrastive learning pretrain can improve performance on the downstream task. With lack of detailed annotations, most weekly-supervised rely on ImageNet pretrained model as the initial weight. Srinidhi et al. [14] proposed to train an in-domain weight with MoCo v2 [15] for the downstream multiple instance learning task, which significantly outperforms the result by using ImageNet initials on Camelyon16 and TCGA datasets. Srinidhi et al. [16] proposed a new self-supervised pretext task Resolution sequence prediction (RSP) for histopathology inspired by the way a pathologist searches for cancerous regions in a WSI. After applying RSP to obtain task-agnostic model, use small portion of label data for finetune and continue with semi-supervised consistency learning. Result shows that the framework can be label-efficient and improve task-specific semi-supervised learning on standard benchmarks. Benefit from self-supervised driven training framework, with only small portion of label needed can train for a reliable performance model. However, current contrastive learning focus on classification method. In the need for precise pixelwise segmentation...... ## Reference - [1](https://arxiv.org/abs/1803.07728) N. Komodakis, and S. Gidaris, "Unsupervised representation learning by predicting image rotations," in International Conference on Learning Representations (ICLR), 2018. - [2](https://link.springer.com/chapter/10.1007/978-3-319-46466-4_5) M. Noroozi, and P. Favaro, "Unsupervised learning of visual representations by solving jigsaw puzzles," in European Conference on Computer Vision (ECCV), 2016. - [3](https://openaccess.thecvf.com/content_ECCV_2018/html/Carl_Vondrick_Self-supervised_Tracking_by_ECCV_2018_paper.html) C. Vondrick, A. Shrivastava, A. Fathi, S. Guadarrama, and K. Murphy, "Tracking emerges by colorizing videos," proceedings of the European Conference on Computer Vision (ECCV), 2018. - [4](https://proceedings.neurips.cc/paper/2016/file/b1301141feffabac455e1f90a7de2054-Paper.pdf) A. Van den Oord et al., "Conditional image generation with PixelCNN decoders," in Advances in Neural Information Processing Systems 29, 2016. - [5](https://arxiv.org/pdf/1807.03748.pdf) A. van den Oord, Y. Li, and O. Vinyals, "Representation learning with contrastive predictive coding," arXiv e-prints, 2018. - [6](http://proceedings.mlr.press/v119/henaff20a/henaff20a.pdf) O. Henaff et al., "Data-efficient image recognition with contrastive predictive coding," in International Conference on Machine Learning (PMLR), 2020. - [7](https://link.springer.com/chapter/10.1007/978-3-030-58621-8_45) Y. Tian, D. Krishnan, and P. Isola, "Contrastive multiview coding," in European Conference on Computer Vision (ECCV), 2020. - [8](https://arxiv.org/abs/2002.05709) T. Chen et al., "A simple framework for contrastive learning of visual representations," in International Conference on Machine Learning (PMLR), 2020. - [9](https://openaccess.thecvf.com/content_CVPR_2020/papers/He_Momentum_Contrast_for_Unsupervised_Visual_Representation_Learning_CVPR_2020_paper.pdf) K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, "Momentum contrast for unsupervised visual representation learning," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. - [10](https://proceedings.neurips.cc/paper/2020/file/f3ada80d5c4ee70142b17b8192b2958e-Paper.pdf) J.-B. Grill et al., "Bootstrap your own latent: A new approach to self-supervised learning," Advances in Neural Information Processing Systems 33, 2020. - [11](https://proceedings.neurips.cc/paper/2020/file/70feb62b69f16e0238f741fab228fec2-Paper.pdf) M. Caron et al., "Unsupervised learning of visual features by contrasting cluster assignments," Advances in Neural Information Processing Systems 33, 2020. - [12](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Exploring_Simple_Siamese_Representation_Learning_CVPR_2021_paper.pdf) X. Chen, and K. He, "Exploring simple siamese representation learning," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. - [13](https://arxiv.org/abs/2011.13971) O. Ciga, T. Xu, and A. L. Martel, "Self supervised contrastive learning for digital histopathology," Machine Learning with Applications, vol. 7, 2022. - [14](https://arxiv.org/abs/2012.03583) O. Dehaene, A. Camara, O. Moindrot, A. de Lavergne, and P. Courtiol, “Self-Supervision Closes the Gap Between Weak and Strong Supervision in Histology.” arXiv, Dec. 07, 2020. - [15](https://arxiv.org/abs/2003.04297) X. Chen, H. Fan, R. Girshick, and K. He, “Improved Baselines with Momentum Contrastive Learning.” arXiv, Mar. 09, 2020. - [16](https://arxiv.org/abs/2102.03897) C. L. Srinidhi, S. W. Kim, F.-D. Chen, and A. L. Martel, “Self-supervised driven consistency training for annotation efficient histopathology image analysis,” Medical Image Analysis, vol. 75, p. 102256, Jan. 2022