nicholas
clustering
I want to use unsupervised learning to try to understand what is going on inside neural networks. In particular I mean mapping activation patterns (which exist in some weird incomprehensible space) to a space with human understandable structure.
Original paper by Facebook AI Research
Iteratively repeat the following:
From the original paper:
a multilayerperceptron classifier on top of the last convolutional layer of a random AlexNet achieves 12% in accuracy on ImageNet while the chance is at 0.1% (source)
This means that the output from the last convolution of the random AlexNet still preserves a significant amount of structure, which K-means can exploit to produce clusters which are not entirely random. The clusters are still mostly random, but the little bit of structure can be bootstrapped to provide a training signal which leads to features which can be used to more cleanly separate the data into distinct categories.
In theory, there are arbitrarily many sets of features which allow us to cleanly separate the data. The network, however, has implicit prior over the kinds of features it learns, and it seems that this prior can lead to human interpretable features.
The reason this works, in my opinion, is that the Natural Abstraction Hypothesis is true.
The problem with applying unsupervised methods to activation patterns in a NN is that it's a really high dimensional space, and we don't really know what structure data within that space will have. Applying brittle methods which make incorrect assumptions about that structure will likely be suboptimal. For example, K-means is notorious for failing to model irregular datasets:
Each unsupervised method must make some assumptions, or have some prior, over the kind of structure it is meant to extract. Different methods make different assumptions, and it takes a lot of experience and practice to know which method to apply in which situation.
The advantage of Deep Clustering is that we can use a NN's own simplicity prior, instead of a prior handcrafted by humans, to try to extract features from an activaiton pattern. An activation pattern is a type of data produced by one part of a NN with the purpose of being useful to another part of a NN. It seems natural to expect that the simplicity prior of the NN we intend to study may be very similar to the simplicity prior of a NN we intend to use as an unsupervised reporter.
More broadly, I'd like to explore any and all unsupervised methods (not just Deep Clustering) which use the simplicity prior of a NN rather than handcrafted assumptions, and use them to produce powerful unsupervised reporters which are immune to Goodhearting.
This method is similar to Deep Clustering, but claims to be more flexible and efficient. (I don't 100% understand how it works yet, here is their paper and code.)
They also find that they can use the features as a pretraining step for supervised learning tasks, and claim to outperform Deep Clustering. They also find that they can use the features generated to do KNN classification.