# Deep Cluster
>[source](https://arxiv.org/pdf/1807.05520.pdf)
- ### Abstract
- Many unsupervised methods use pretext tasks to create substitute labels to formulate unsupervised problem as a supervised one.
- But these tasks require domain specific knowledge so as an alernative DeepCluster was proposed in the paper.
- It is a unsupervised model which combines unsupervised clustering with deep neural networks.
- ### Architecture
- 
- AlexNet is the ConvNet used as a feature extractor.
- The ConvNet weights are randomly initialised and the feature vector from before classification head is taken.
- PCA along with whitening and L2 normalization is applied after which it is passed to k -means for clustering.
- These clusters are taken as pseudo labels on which training is done(BCE loss used).
- ### Training
- First unlabeled data is taken from datasets and augmentation is applied so that the clusters are invariant to transformations.
- During clustering the image is resized to 256X256 and center crop is applied to get 224X224 and normalization is apllied.
- During training random augmentation is applied, the image is cropped to random zise and then resized to 224X224 and put for a 50% chance of horizontal flip, then normalization is apllied.
- For the whitening part the normalized image are converted to grayscale and sobel filters are applied.
- For clustering the number of clusters need to be input although ImageNet has 1000 classes taking 10000 clusters gives better results.
- The model used is AlexNet with 5 conv layers and 3 FC layers.
- While clustering the last FC layer is removed and feature vector from previous layer is taken.
- ### Avoiding Trivial Solutions
- **Empty clusters:** When a cluster becomes empty, we randomly select a non-empty cluster and use its centroid with a small random perturbation as the new centroid for the empty cluster.
- **Trivial parametrization:** If the vast majority of images is assigned to a few clusters, the parameters will exclusively discriminate between them. A strategy to circumvent this issue is to sample images based on a uniform distribution over the classes, or pseudo-labels.
- ### Evaluation
- This model is evaluated on PASCAL VOC for object detection and semantic segmentation where it outperforms other unsupervised methods.
- The DeepCluster algorithm favours balanced datasets but to show it works on uncured data distribution random flickr images from YCC100M is used for pretraining.
- The results show that the model is robust to change in data distribution.
- In supervised learning models like VGG and ResNet outperform AlexNet to see whether same holds true in unsupevised method.
- The model is trained using VGG 16 instead of AlexNet which gives a better results only 1.4% below the supervised topline.
- For instance learning the model is evaluated on Oxford buildings taking image retrieval as a downstream task.
- The results conclude that pretraining performs an important role in instance learning.
- ### Conclusion
- DeepCluster outperforms state of the art models on various transfer learning tasks.
- It does not requires any domain specific knowledge.