Deep Cluster - HackMD

# Deep Cluster >[source](https://arxiv.org/pdf/1807.05520.pdf) - ### Abstract - Many unsupervised methods use pretext tasks to create substitute labels to formulate unsupervised problem as a supervised one. - But these tasks require domain specific knowledge so as an alernative DeepCluster was proposed in the paper. - It is a unsupervised model which combines unsupervised clustering with deep neural networks. - ### Architecture - ![](https://i.imgur.com/xI5fdPR.gif) - AlexNet is the ConvNet used as a feature extractor. - The ConvNet weights are randomly initialised and the feature vector from before classification head is taken. - PCA along with whitening and L2 normalization is applied after which it is passed to k -means for clustering. - These clusters are taken as pseudo labels on which training is done(BCE loss used). - ### Training - First unlabeled data is taken from datasets and augmentation is applied so that the clusters are invariant to transformations. - During clustering the image is resized to 256X256 and center crop is applied to get 224X224 and normalization is apllied. - During training random augmentation is applied, the image is cropped to random zise and then resized to 224X224 and put for a 50% chance of horizontal flip, then normalization is apllied. - For the whitening part the normalized image are converted to grayscale and sobel filters are applied. - For clustering the number of clusters need to be input although ImageNet has 1000 classes taking 10000 clusters gives better results. - The model used is AlexNet with 5 conv layers and 3 FC layers. - While clustering the last FC layer is removed and feature vector from previous layer is taken. - ### Avoiding Trivial Solutions - **Empty clusters:** When a cluster becomes empty, we randomly select a non-empty cluster and use its centroid with a small random perturbation as the new centroid for the empty cluster. - **Trivial parametrization:** If the vast majority of images is assigned to a few clusters, the parameters will exclusively discriminate between them. A strategy to circumvent this issue is to sample images based on a uniform distribution over the classes, or pseudo-labels. - ### Evaluation - This model is evaluated on PASCAL VOC for object detection and semantic segmentation where it outperforms other unsupervised methods. - The DeepCluster algorithm favours balanced datasets but to show it works on uncured data distribution random flickr images from YCC100M is used for pretraining. - The results show that the model is robust to change in data distribution. - In supervised learning models like VGG and ResNet outperform AlexNet to see whether same holds true in unsupevised method. - The model is trained using VGG 16 instead of AlexNet which gives a better results only 1.4% below the supervised topline. - For instance learning the model is evaluated on Oxford buildings taking image retrieval as a downstream task. - The results conclude that pretraining performs an important role in instance learning. - ### Conclusion - DeepCluster outperforms state of the art models on various transfer learning tasks. - It does not requires any domain specific knowledge.