Chaoyi Zhu
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Reproduce of SimCLR ### Reproduced by Chaoyi Zhu Congwen Chen Zhiyang Liu (5534496, Email : z.liu-57@student.tudelft.nl) --------------------- This blog aims to present the reproduction work of the paper [A Simple Framework for Contrastive Learning of Visual Representations](https://arxiv.org/pdf/2002.05709.pdf). The paper introduces a new method for contrastive learning of visual representations. The innovation of this work is to use **aggressive data augmentations**. The resulting "harder" tasks can dramatically improve the quality of learned representations. This work achieved strong results and outperformed previous methods for self-supervised and semi-supervised learning on ImageNet and strong generalization performance on other datasets as well. Our objective is to reproduce the results in Table 8 of the original paper as shown below. The structure of this blog can be divided into 2 parts, including introduction of the content in the paper and the work we have done. Our working generally includes reproduction of the paper, extending application on RPLAN dataset and visualization of the training result. ![](https://i.imgur.com/JORMqqp.png) ## Introduction --------------------- For decades, a large class of ML methods relies on human-provided labels or rewards as the only form of learning signals used during the training process. These methods, known as Supervised Learning approaches, heavily rely on the amount of annotated training data available. Although raw data is vastly available, annotating data is known to be expensive. ### Self-supervised Learning Self-supervised learning (SSL) is a method of machine learning. It learns from unlabeled sample data, and can be regarded as an intermediate form between supervised and unsupervised learning. As one may ask, how can we train a neural network without labels? Neural networks are generally trained on some objective function. Without labels, how can we measure the performence of a network? Self-supervised learning answers this question by proposing **tasks** for the network to solve, where performance is easy to measure. For example, in the field of Computer Vision(CV), the task could be filling in image holes, or colorizing grayscale images. Ideally, a good task will be difficult to solve if the network cannot capture some form of image semantics. Neural networks pre-trained on these tasks can be fine-tuned on downstream tasks with less labeled data than those initialized randomly. A typical procedure of SSL is shown in the figure below. ![](https://i.imgur.com/PBammSs.png) ### Contrastive Self-supervised Learning As one of the most popular topics in the past few years, Contrastive learning methods, as the name implies, learn representations by contrasting positive and negative examples. They aim to group similar samples closer and diverse samples far from each other. Contrastive self-supervised learning is to use such methods in the pre-training process and has led to great empirical success in computer vision tasks. The main motivation for contrastive learning comes from human learning patterns. Humans recognize objects without remembering all the little details. For example, Epstein ran an [experiment](https://aeon.co/essays/your-brain-does-not-process-information-and-it-is-not-a-computer) in 2016 that asked subjects to draw dollars in as much detail as possible. In the figure below, the left one is the result drawn by people without any reference, and the right one is drawn with a bill (not a One Dollar bill) in hand. People are very familiar with bills, but this experiment shows some more abstract features(Like figures in the corners, a portrait in the middle.), instead of all the details that help people to recognize or remember an item. ![](https://i.imgur.com/MScuyWA.png) Roughly speaking, we create some kind of representation in our minds, and then we use them to recognize new objects. And the main goal of contrastive self-supervised learning is to create and generalize these representations. More formally, for any data point $x$, contrastive methods aim to learn an encoder $f$ that maximizes $similarity(f(x), f(x^+))$ and minimizes $similarity(f(x), f(x^-))$. Here, $x^+$ is a data point similar to the input $x$. In other words, the observations $x$ and $x^+$ are correlated and the pair $(x,x^+)$ represents a positive sample, while $x$ and $x^-$ are unrelated and $(x,x^-)$ represents a negative pair. In most cases, we can implement different augmentation techniques (Image rotation, cropping and etc.) to generate positive samples. In contrastive learning, we aim to minimize the difference between the positive pairs while maximizing the difference between positives and negatives. ### A Simple Framework for Contrastive Learning of Visual Representations (SimCLR) And here comes the method we are about to reproduce, SimCLR[1]. It uses the principles of contrastive learning we described above. As mentioned in former parts, the essence of strength of the work is the aggressive data augmentation. By doing so, "harder" samples are generated to train the ability of the network to learn representations. In fact, the authors demonstrated in the paper that such stronger data augmentation can benefit unsupervised contrastive learning "dramatically" while the same augmenation does not improve and even hurts performance of supervised models. The architecture of SimCLR is shown below. An image is taken and random transformations are applied to it to get a pair of two augmented images $x_i$ and $x_j$. Each image in that pair is passed through an encoder to get representations. Then a non-linear fully connected layer is applied to get representations $z_i$ and $z_j$. The task is to maximize the similarity between these two representations $z_i$ and $z_j$ for the same image. ![](https://i.imgur.com/nBp4aSF.jpg) Here we give a description of the procedure of training steps in SimCLR. To start with, we have a training corpus, which consists of unlabeled images. **1. Data augmentation** And first of all, we perform data augmentations on a batch. The actual batch size might be a big number like 8192. For the convenience of introduction, we will use a small batch size N = 2 to explain here. ![](https://i.imgur.com/tDoPcvZ.png) Many possible operations of augmentations are available. In the paper, various data augmentation operations and their combinations are tested, involving spatial transformations like cropping and rotation and appearance transformations like color distortion. The authors of the paper concluded, after experiments: 1. no single transformation suffices to learn good representations 2. the quality of representation improves dramatically when composing augmentations And the composition of random cropping and random color distortion stands out. ![](https://i.imgur.com/XfqFzwT.png) For each image in a batch, we get an augmented version of it. So for a batch size of N, we get 2N images. **2. Encoding** The pair of images $(x_i, x_j)$ then will be encoded to get the representations. Usually the encoded representations are of much lower dimensions, which is more efficient to work with. The encoder is general and can be replaced by other possible designs. ResNet-50 is used in this paper. ![](https://i.imgur.com/fgmyacm.png) **3. Projection head** In SimCLR, visual representations $h_i, h_j$ obtained by encoders are then processed by a projection head $g(.)$. And the final representation $z = g(h)$. In the paper, the projection head is a Multilayer Perception with 2 dense layers, and the hidden layer uses a ReLU activation function. ![](https://i.imgur.com/igTzVYr.png) **4. Loss Calculation** At this step, we have the final presentations $z_1, ..., z_4$. ![](https://i.imgur.com/TXElx1M.png) Cosine Similarity is used to measure the similarity between representations. ![](https://i.imgur.com/QaMF4rg.png) To calculate the loss over a batch, the paper uses the NT-Xent loss (Normalized Temperature-Scaled Cross-Entropy Loss). The loss of each pair of representations is calculated as: ![](https://i.imgur.com/G4Bwi1W.png) And the Loss of a batch is the average over all the pairs: ![](https://i.imgur.com/ySjWvp1.png) This concludes the training iteration of a batch. **5. Fine-tune** By far, the encoder has been trained to output representations. And now the model is ready to be fine-tuned to deal with downstream tasks. The paper claims that accuracy of 76.5% on ImageNet can be achieved with SimCLR using ResNet-50 (4x), and 85.8% if fine-tuned with 1% labels. In the following sections, we introduce our implementations of SimCLR and the results. ## Our work The goal of this report is to show our effort in reproducing the paper "A Simple Framework for Contrastive Learning of Visual Representation". [Link to paper](https://arxiv.org/pdf/2002.05709.pdf) We reimplement SIMCLR using PyTorch based on the official TensorFlow version. Moreover, as the requirement of the course, we reproduce the result in table 8 on the CIFAR10 dataset and get a nice visualization effect on trained image vectors. Moreover, we also extend our work to a new dataset RPLAN, and also achieves good visualization results. In general, our work can be divided into the following parts: - Reimplement the paper using PyTorch on Jupyter Notebook. - Reproduce the result of table 8 in the paper using different training strategies, including finetuning and linear evaluation, by using the pretrained RESNET(1X) and RESNET(4X) models. - Extend to apply SIMCLR on the RPLAN dataset. The work includes applying transform on RPLAN images(so that they can fit in the model) and training on these images. - Visualize the trained image vectors on CIFAR10 and RPLAN datasets by using PCA and SNE, and analysis the pretraining performance of the model. ### Reproduction of the paper This section mainly focuses on reproducing the table 8 result on CIFAR10 in the paper. We use the official pretrained checkpoint to pre-load the RESNET model before finetuning. To make our model fit the downstream classification task, we add a logistic regression layer at the end of the RESNET model. We train on the model using two different training strategies, finetuning and linear evaluation. Finetuning is just like the normal training process, the gradient passes through all the models and all parameters get updated after one backward. For linear evaluation, the parameter of the RESNET model is frozen while training, and only the parameter of the logistic regression layer gets updated. Two strategies share almost the same code, the only difference is mainly in the training and testing process. Two different versions of RESNET models are used, including RESNET50(1X) and RESNET50(4X), and we train the model for 500 epochs and compare the test accuracy with the result in the paper. We set the batch size as 64 for ResNet1X model and 32 for ResNet4X model. These are the maximum size we can use because if we increase them, an OOM error will occur. We run our codes in the CIFAR10 dataset during reproduction work but our code can also be applied to other image classification datasets, including STL10. To transform the official TensorFlow checkpoint to the Pytorch checkpoint, we use the converter provided in this [repository]([tonylins/simclr-converter: A PyTorch converter for SimCLR checkpoints (github.com)](https://github.com/tonylins/simclr-converter)) and load the converted checkpoint during finetuning and linear evaluation. #### Analysis of loss curve ##### Linear Evaluation We use weight & biase to get the loss curve of the training process. As shown in the following graph. The X-axis refers to the epoch number, and the y-axis refers to the corresponding accuracy or loss. During the linear evaluation, the model quickly converges and the loss almost reaches 0 in around 200 epochs. We also compare the convergence speed between ResNet1X and ResNet4X. The result shows that a larger pretraining model won't make the logistic regression layer converge faster. <img src="https://s1.ax1x.com/2022/04/08/LpA8N6.png" alt="i1" style="zoom: 15%;" /><img src="https://cdn.discordapp.com/attachments/884910103428476989/961713567076352130/WB_Chart_4_7_2022_9_41_15_PM.png" alt="i2" style="zoom:15%;" /><img src="https://cdn.discordapp.com/attachments/884910103428476989/961717185229779024/WB_Chart_4_7_2022_10_00_22_PM.png" alt="i2" style="zoom:15%;" /><img src="https://cdn.discordapp.com/attachments/884910103428476989/961717185468850206/WB_Chart_4_7_2022_10_00_28_PM.png" alt="i2" style="zoom:15%;" /> ##### Finetune We also plot the learning curve of finetuning training phase and compare it with the learning curve while the model learns from scratch. The graph is shown below. Different from linear evaluation, finetuning requires much more computation and takes approximately 30 minutes to run 1 epoch, so the x-axis now refers to the step number. Pretrained models converge much faster compared with the model trained from scratch and achieve much higher accuracy while training. It proves that the pretraining process does take effect. <img src="https://cdn.discordapp.com/attachments/884910103428476989/962094184561533028/WB_Chart_4_8_2022_10_57_18_PM.png" alt="i1" style="zoom: 15%;" /><img src="https://cdn.discordapp.com/attachments/884910103428476989/962094184825765888/WB_Chart_4_8_2022_10_57_05_PM.png" alt="i2" style="zoom:15%;" /><img src="https://cdn.discordapp.com/attachments/884910103428476989/962094185064833054/WB_Chart_4_8_2022_10_57_32_PM.png" alt="i3" style="zoom:15%;" /><img src="https://cdn.discordapp.com/attachments/884910103428476989/962094185270358046/WB_Chart_4_8_2022_10_57_25_PM.png" alt="i4" style="zoom:15%;" /> #### Result comparison In this section, we compare the performance of different training strategies and the same strategy with the performance in the paper. All the results are shown in the following table. We fail to run ResNet4X finetune because of a lack of computational resources. Our ResNet1X finetuning has almost the same performance compared with the original paper. However, after trying different training settings, our linear evaluation still can not achieve the same performance as the paper. After reading the paper and comparing our code with the official code, I think it might result from the small batch size we use. The author shows that a larger batch size over 512 can significantly increase the performance. But because of a lack of memory resources, that does not apply to us. | Training Setup | Note | Accuracy | | --------------------------- | ----------------------------- | -------- | | ResNet1X finetune | loading pretrained checkpoint | 0.955 | | ResNet1X finetune | learn from scratch | 0.823 | | ResNet1X finetune | in the original paper | 0.977 | | ResNet1X linear evaluation | our implementation | 0.852 | | RestNet1X linear evaluation | in the original paper | 0.906 | | ResNet4X linear evaluation | our implementation | 0.897 | | ResNet4X linear evaluation | in the original paper | 0.953 | ## References [1] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” *arXiv preprint arXiv:2002.05709, 2020*.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully