# Notes on "[Deep Clustering for Unsupervised Learning of Visual Features](https://arxiv.org/pdf/1807.05520.pdf)"
###### tags: `notes` `unsupervised`
Notes Author: [Rohit Lal](https://rohitlal.net/)
---
## Brief Outline
- Little work has been done to adapt it to the end-to-end training of visual features on large scale datasets.
- DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features
- DeepCluster iteratively groups the features with a standard clustering algorithm, kmeans, and uses the subsequent assignments as supervision to update the weights of the network.
## Methodology
![](https://i.imgur.com/X4WDXsZ.png)
> Overall, DeepCluster alternates between clustering the features to produce pseudo-labels and updating the parameters of the convnet by predicting these pseudo-labels.
- Take a randomly initialised CNN.
- the performance of such random features on standard transfer tasks, is far above the chance level. For example, a multilayer perceptron classifier on top of the last convolutional layer of a random AlexNet achieves 12% in accuracy on ImageNet while the chance is at 0.1%
- cluster the output of the convnet using k means
- use the subsequent cluster assignments as “pseudo-labels” to optimize the CNN loss
- This type of alternating procedure is prone to trivial solutions
### Avoiding Trivial Solutions
- *Empty clusters*: when a cluster becomes empty, we randomly select a non-empty cluster and use its centroid with a small random perturbation as the new centroid for the empty cluster.
- *Trivial parametrization*: If the vast majority of images is assigned to a few clusters, the parameters will exclusively discriminate between them. A strategy to circumvent this issue is to sample images based on a uniform distribution over the classes, or pseudo-labels.
## Conclusion
- It iterates between clustering with k-means the features produced by the convnet and updating its weights by predicting the cluster assignments as pseudo-labels in a discriminative loss.
- it achieves performance that are significantly better than the previous state-of-the-art on every standard transfer task.
- makes little assumption about the inputs, and does not require much domain specific knowledge, making it a good candidate to learn deep representationsZspecific to domains where annotations are scarce.