## MoCo | Title: | Momentum Contrast for Unsupervised Visual Representation Learning (MoCo) | | ------------ | ---- | | **Authors:** | Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick | | **Blog:** | | | **Read:** | ✓ | | **Referee:** | | ### Motivation The task of producing a feature extractor through Contrastive Learning as Dictionary Look-up involves training a query encoder and a key encoder, to make the encoding of the query and positive key similar, and the encodings of the query and negative keys dissimilar. It has been found that a large number of negative keys must be used for the method to be effective, which makes end-to-end training computationaly intractable. ### Approach **MoCo is an algorithm to train the key encoder, which is more scalable than end-to-end training.** It suggests we only train the query encoder through back-propagation, and set the parameters of the key encoder to a weighted average of the parameters of the query encoder and the previous parameters of the key encoder. Prior to MoCo, the only mechanism available to accomplish this was [Memory Bank](https://arxiv.org/pdf/1805.01978v1.pdf). ### Experiments #### Linear classification The feature extractors through self-supervised learning using MoCo, Memory Bank and end-to-end training were benchmarked on ImageNet under the linear classification protocol. MoCo outperformed Memory Bank, and performed competitively with end-to-end training. #### Transfering features for downstream tasks The MoCo feature extractors trained on ImageNet and Instagram-1B were benchmarked against one that was obtained through supervised training on ImageNet, on the 6 downstream tasks. On 5 tasks, the Instagram feature extractor outperforms the supervised counterpart, and the ImageNet feature extractor outperforms it or performs competitively with it. On 1 task, the supervised counterpart outperforms both feature extractors.