# Notes on "[Deep Clustering for Unsupervised Learning of Visual Features](https://arxiv.org/pdf/1807.05520.pdf)" ###### tags: `notes` `unsupervised` Notes Author: [Rohit Lal](https://rohitlal.net/) --- ## Brief Outline - Little work has been done to adapt it to the end-to-end training of visual features on large scale datasets. - DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features - DeepCluster iteratively groups the features with a standard clustering algorithm, kmeans, and uses the subsequent assignments as supervision to update the weights of the network. ## Methodology ![](https://i.imgur.com/X4WDXsZ.png) > Overall, DeepCluster alternates between clustering the features to produce pseudo-labels and updating the parameters of the convnet by predicting these pseudo-labels. - Take a randomly initialised CNN. - the performance of such random features on standard transfer tasks, is far above the chance level. For example, a multilayer perceptron classifier on top of the last convolutional layer of a random AlexNet achieves 12% in accuracy on ImageNet while the chance is at 0.1% - cluster the output of the convnet using k means - use the subsequent cluster assignments as “pseudo-labels” to optimize the CNN loss - This type of alternating procedure is prone to trivial solutions ### Avoiding Trivial Solutions - *Empty clusters*: when a cluster becomes empty, we randomly select a non-empty cluster and use its centroid with a small random perturbation as the new centroid for the empty cluster. - *Trivial parametrization*: If the vast majority of images is assigned to a few clusters, the parameters will exclusively discriminate between them. A strategy to circumvent this issue is to sample images based on a uniform distribution over the classes, or pseudo-labels. ## Conclusion - It iterates between clustering with k-means the features produced by the convnet and updating its weights by predicting the cluster assignments as pseudo-labels in a discriminative loss. - it achieves performance that are significantly better than the previous state-of-the-art on every standard transfer task. - makes little assumption about the inputs, and does not require much domain specific knowledge, making it a good candidate to learn deep representationsZspecific to domains where annotations are scarce.