Try   HackMD

Notes on "Deep Clustering for Unsupervised Learning of Visual Features"

tags: notes unsupervised

Notes Author: Rohit Lal


Brief Outline

  • Little work has been done to adapt it to the end-to-end training of visual features on large scale datasets.
  • DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features
  • DeepCluster iteratively groups the features with a standard clustering algorithm, kmeans, and uses the subsequent assignments as supervision to update the weights of the network.

Methodology

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Overall, DeepCluster alternates between clustering the features to produce pseudo-labels and updating the parameters of the convnet by predicting these pseudo-labels.

  • Take a randomly initialised CNN.
  • the performance of such random features on standard transfer tasks, is far above the chance level. For example, a multilayer perceptron classifier on top of the last convolutional layer of a random AlexNet achieves 12% in accuracy on ImageNet while the chance is at 0.1%
  • cluster the output of the convnet using k means
  • use the subsequent cluster assignments as “pseudo-labels” to optimize the CNN loss
  • This type of alternating procedure is prone to trivial solutions

Avoiding Trivial Solutions

  • Empty clusters: when a cluster becomes empty, we randomly select a non-empty cluster and use its centroid with a small random perturbation as the new centroid for the empty cluster.
  • Trivial parametrization: If the vast majority of images is assigned to a few clusters, the parameters will exclusively discriminate between them. A strategy to circumvent this issue is to sample images based on a uniform distribution over the classes, or pseudo-labels.

Conclusion

  • It iterates between clustering with k-means the features produced by the convnet and updating its weights by predicting the cluster assignments as pseudo-labels in a discriminative loss.
  • it achieves performance that are significantly better than the previous state-of-the-art on every standard transfer task.
  • makes little assumption about the inputs, and does not require much domain specific knowledge, making it a good candidate to learn deep representationsZspecific to domains where annotations are scarce.