iris-tang
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    --- title: ConvNet Evolutions, Architectures, Implementation Details and Advantages. authors: Chris Ick, Soham Tamba, Ziyu Lei, Hengyu Tang date: 14 February 2020 --- ## [Proto-CNNs and Evolution to Modern CNNs](00:56:40-1:11:20) ### [Proto-Convolutional Neural Nets on Small Dataset](00:56:40-00:58:15) Inspired by Fukushima's work on visual cortex modeling, using the simple/complex cell hierarchy combined with supervised training and backpropogation lead to the development of the first ConvNets at University of Toronto in '88-'99 by Prof. Yann LeCunn. The experiments used a small dataset of 320 'mouser-written' digits. Performances of the following architectures were compared: 1. Single FC(fully connected) Layer 2. Two FC Layers 3. Locally Connected Layers w/o shared weights 4. Constrained network w/ shared weights and local connections 5. Constrained network w/ shared weights and local connections 2 (more feature maps) The most successful networks (constrained network with shared weights) had the strongest generalizability, and form the basis for modern CNNs. Meanwhile, singler FC layer tends to overfit. ### [First "Real" ConvNets at Bell Labs](00:58:15-00:59:39) After moving to Bell Labs, LeCunn's research shifted to using handwritten zipcodes from the US Postal service to train a larger CNN: * 256 (16x16) input layer * 12 5x5 kernels with stride 2 (stepped 2 pixels): next layer has lower resolution * **NO** separate pooling ### [Convolutional Network Architecture w/ Pooling](00:59:39-1:05:52) The next year, some changes were made: separate pooling was introduced. Separate pooling is done by averaging input values, adding a bias, and passing to a nonlinear function (hyperbolic tangent function). The 2x2 pooling was performed with a stride of 2, hence reducing resolutions by half. <center> <img src="detailed_convNet.png" width="600px" /><br> <b>Fig. 1</b> ConvNet Architecture </center> An example of a single convolutional layer would be as follows: 1. Take an input with size *32x32* 2. The convolution layer passes a 5x5 kernel with stride 1 over the image, resulting feature map size *28x28* 3. Pass the feature map to a nonlinear function: size *28x28* 4. Pass to the pooling layer that averages over a 2x2 window with stride 2: size *14x14* 5. Repeat 1-4 for 4 kernels <!-- 1. Take an input of size 32x32 2. The convolutional layer passes a 5x5 kernel with stride 2 over the image 3. Pass the output of the convolution through a nonlinear function 4. Pass the resulting 28x28 feature map to the pooling layer 5. The pooling layer averages over a 2x2 window with stride 2 6. Pass the resulting 14x14 feature map to the next convolutional layer 7. Repeat 1-6 for each kernel --> The first-layer, simple convolution/pool combinations usually detect simple features, such as oriented edge detections. After the first convolution/pool layer, the objective is to detect combinations of features from previous layers. To do this, steps 2 to 4 are repeated with multiple kernels over previous-layer feature maps, and are summed in a new feature map: 8. A new 5x5 kernel is slided over all feature maps from pervious layers, with results summed up. (Note: In Prof. LeCun's experiment in 1989 the connection is not full for computation purpose. Modern settings usually enforce full connections): size *10x10* 9. Pass the output of the convolution to a nonlinear function: size *10x10* 10. Repeat 8/9 for 16 kernels. 11. Pass the result to the pooling layer that averages over 2x2 window with stride 2: size *5x5* each feature map To generate an output, the last layer of convolution is conducted, which seems like a full connections but indeed is convolutional. 12. The final convolution layer slides a 5x5 kernel over all feature maps, with results summed up: size *1x1* 13. Pass through nonlinear function: size *1x1* 14. Generate the single output for one category. 15. Repeat steps 2-14 for each of the 10 categories(in parallel) <!-- 8. The convolutional layer passes a 5x5 kernel with stride 1 over previous feature map 9. Pass the output of the convolution to a nonlinear function 10. Hold the resulting 10x10 feature map 11. Repeat steps 8/9 for multiple kernels 12. Sum the resulting feature maps and pass to the next pooling layer 13. Repeat steps 8-12 for each set of kernels (10 sets, one for each output) The resulting 10 sets of feature maps are only a few steps from becoming an output. 8. The pooling layer averages over a 2x2 window with stride 2 9. Pass the resulting 5x5 feature maps to the final convolutional layer 10. The final convolutional layer passes a 5x5 kernel over the image 11. Pass the output of the convolution to a nonlinear function 12. The result is an output --> See [this animation](http://cs231n.github.io/convolutional-networks/) on Andrej Karpathy's website on how convolutions change the shape of the next layer's feature maps. Full paper can be found [here](https://papers.nips.cc/paper/293-handwritten-digit-recognition-with-a-back-propagation-network.pdf). ### [Shift Equivariance](1:05:52-1:09:25) <center> <img src="shift_invariance.gif" width="300px" /><br> <b>Fig. 2</b> Shift Equivariance </center> As demonstrated by the animation on the slides(here's another example), translating the input image results in same translation of the feature maps. However, the changes in feature maps are scaled by convolution/pooling operations. E.g. the 2x2 pooling with stride 2 will reduce the 1-pixel shift in input layer to 0.5-pixel shift in the following feature maps. Spatial resolution is then exchanged for incraesed number of feature types, i.e. making the representation more abstract and less sensitive to shifts and distortions. ### [Overall Architecture Breakdown](1:09:25-1:11:20) Generic CNN architecture can broken down into several basic layer archetypes: * Normalization layers - adjusting whitening (optional) * Subtratrive methods e.g. average removal, high pass filtering * Divisive: local contrast normalization, variance normalization * Filter Banks: increasing dimensionality, finding edges, etc. * Non-linearities: sparsification * Typically Rectified Linear Unit (ReLU): $\text{ReLU}(x) = \text{max}(x, 0)$ * Pooling: aggregating over a feature map: * $\text{MAX}= \text{Max}_i(X_i)$ * $L_p = \left(\sum_{i=1}^n |X_i|^p\right)^{1/p}$ * $\text{Prob}= \frac{1}{b} \left(\sum_{i=1}^n e^{b X_i} \right)$ ## [LeNet5 and Digit Recognition](1:11:20-1:27:58) ### [Implementation of LeNet5 in PyTorch](1:11:20-1:16:50) LeNet5 consists of the following layers (1 being the top-most layer): 1. Log-softmax 2. Fully connected layer of dimensions 500x10 3. ReLu 4. Fully connected layer of dimensions (4x4x50)x500 5. Max Pooling of dimensions 2x2, stride of 2. 6. ReLu 7. Convolution with 20 output channels, 5x5 kernel, stride of 1. 8. Max Pooling of dimensions 2x2, stride of 2. 9. ReLu 10. Convolution with 20 output channels, 5x5 kernel, stride of 1. The input is a 32x32 gray scale image (1 input channel). LeNet5 can be implemented in PyTorch with the following code: ```python class LeNet5(nn.Module): def __init__(self): super(LeNet5, self).__init__() self.conv1 = nn.Conv2d(1, 20, 5, 1) self.conv2 = nn.Conv2d(20, 20, 5, 1) self.fc1 = nn.Linear(4*4*50, 500) self.fc2 = nn.Linear(500, 10) def forward(self, x): x = F.relu(self.conv1(x)) x = F.max_pool2d(x, 2, 2) x = F.relu(self.conv2(x)) x = F.max_pool2d(x, 2, 2) x = x.view(-1, 4*4*50) x = F.relu(self.fc1) x = self.fc2(x) return F.logsoftmax(x, dim=1) ``` Although fc1 and fc2 are fully connected layers, they can be thought of as convolutional layers whose kernels cover the entire input. Fully connected layers are used for efficiency purposes. The same code can be expressed using `nn.Sequential`, but it is outdated. ## [Advantages of CNN](1:16:50-1:20:21) In a fully convolutional network, there is no need to specify the size of the input. However, changing the size of the input changes the size of the output. Consider a cursive hand-writting recognition system. We do not have to break the input image into segments. We can apply the CNN over the entire image: the kernels will cover all locations in the entire image and record the same output regardless of where the pattern is located. Applying the CNN over an entire image is much cheaper than applying it at multiple locations separately. No prior segmentation is required, which is a relief because the task of segmenting an image is similar to recognizing an image. ### [Example: MNIST](1:20:21-1:27:58) LeNet5 is trained on MNIST images of size 32x32 to classify individual digits in the center of the image. Data augmentation was applied by shifting the digit around, changing the size of the digit, inserting digits to the side. It was also trained with an 11<sup>th</sup> category which represented none of the above. Images labelled by this category were generated either by producing blank images, or placing digits at the side but not the center. <center> <img src="feature_binding.gif" width="300px" /><br> <b>Fig. 3</b> Sliding Window ConvNet </center> The above image demonstrates that a LeNet5 network trained on 32x32 can be applied on a 32x64 input image to recognise the digit at multiple locations. ### [Feature Binding Problem](1:27:58-1:31:11) #### What is the feature binding problem? Visual neural scientists and computer vision people have the problem of defining the object as an object. An object is a collection of features, but how to bind all of the features to form this object? #### How to solve it? We can solve this feature binding problem by using a very simple ConvNet: only two layers of convolutions w/ poolings plus another two fully connected layers without any specific mechanism for it, given that we have enough non-linearities and data to train our ConvNet. <center> <img src="feature_binding.gif" width="300px" /><br> <b>Fig. 4</b> ConvNet Addressing Feature Binding </center> The above animation showcases ability of ConvNet to recognize different digits by moving a single stroke around, demonstrating its ability to address feature binding problems, i.e. recognizing features in a hierarchical, compositional way. ### [Example: Dynamic Input Length](1:31:11-1:40:34) We can build a ConvNet with 2 convolution layers with stride 1 and 2 pooling layers with stride 2 such that the overall stride is 4. Thus, if we want to get a new output, we need to shift our input window by 4. To be more explicit, we can see the figure below (green units). First, we have an input of size 10, and we perform convolution of size 3 to get 8 units. After that, we perform pooling of size 2 to get 4 units. Similarly, we repeat convolution and pooling again and eventually we get 1 output. <center> <img src="example.jpg" width="600px" /><br> <b>Fig. 5</b> ConvNet Architecture On Variant Input Size Binding </center> Let’s assume we add 4 units at the input layer (pink units above), so that we can get 4 more units after the first convolution layer, 2 more units after the first pooling layer, 2 more units after the second convolution layer, and 1 more output. Therefore, window size to generate a new output is 4 (2 stride x2)<!--the overall subsampling we have shown from input to output is 4 (2x2)-->. Moreover, this is the demonstration of the fact that if we increase the size of the input, we will increase the size of every layer. ## [What are ConvNets Good For](1:40:34-1:45:40) ConvNets are good for signals that come to you in the form of multidimensional arrays. Also, they have two major characters. 1. **Locality**: The first one is that there is a strong local correlation between values. If we take two nearby pixels of a natural image, those pixels are very likely to have the same color. As two pixels become further apart, the similarity between the them will decrease. The local correlations can help us to detect local features, which is what the ConvNets are doing. If we feed the ConvNets with permuted pixels, it will not perform well at recognizing the input images, while FC will not be affected. The local correlation justifies local connections. 2. **Stationality**: Second character is that the features are essential and can appear anywhere on the image, so we need to justify the share weights and pooling. Moreover, statistical signals are uniformly distributed, which means we need to repeat the feature detection for everywhere. Furthermore, people make good use of ConvNets on videos, images, text, and speech recognition.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully