UPENDRA KUMAR
  • NEW!
    NEW!  Connect Ideas Across Notes
    Save time and share insights. With Paragraph Citation, you can quote others’ work with source info built in. If someone cites your note, you’ll see a card showing where it’s used—bringing notes closer together.
    Got it
      • Create new note
      • Create a note from template
        • Sharing URL Link copied
        • /edit
        • View mode
          • Edit mode
          • View mode
          • Book mode
          • Slide mode
          Edit mode View mode Book mode Slide mode
        • Customize slides
        • Note Permission
        • Read
          • Only me
          • Signed-in users
          • Everyone
          Only me Signed-in users Everyone
        • Write
          • Only me
          • Signed-in users
          • Everyone
          Only me Signed-in users Everyone
        • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invite by email
        Invitee

        This note has no invitees

      • Publish Note

        Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

        Your note will be visible on your profile and discoverable by anyone.
        Your note is now live.
        This note is visible on your profile and discoverable online.
        Everyone on the web can find and read all notes of this public team.

        Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

        Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

        Explore these features while you wait
        Complete general settings
        Bookmark and like published notes
        Write a few more notes
        Complete general settings
        Write a few more notes
        See published notes
        Unpublish note
        Please check the box to agree to the Community Guidelines.
        View profile
      • Commenting
        Permission
        Disabled Forbidden Owners Signed-in users Everyone
      • Enable
      • Permission
        • Forbidden
        • Owners
        • Signed-in users
        • Everyone
      • Suggest edit
        Permission
        Disabled Forbidden Owners Signed-in users Everyone
      • Enable
      • Permission
        • Forbidden
        • Owners
        • Signed-in users
      • Emoji Reply
      • Enable
      • Versions and GitHub Sync
      • Note settings
      • Note Insights New
      • Engagement control
      • Make a copy
      • Transfer ownership
      • Delete this note
      • Save as template
      • Insert from template
      • Import from
        • Dropbox
        • Google Drive
        • Gist
        • Clipboard
      • Export to
        • Dropbox
        • Google Drive
        • Gist
      • Download
        • Markdown
        • HTML
        • Raw HTML
    Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
    Create Create new note Create a note from template
    Menu
    Options
    Engagement control Make a copy Transfer ownership Delete this note
    Import from
    Dropbox Google Drive Gist Clipboard
    Export to
    Dropbox Google Drive Gist
    Download
    Markdown HTML Raw HTML
    Back
    Sharing URL Link copied
    /edit
    View mode
    • Edit mode
    • View mode
    • Book mode
    • Slide mode
    Edit mode View mode Book mode Slide mode
    Customize slides
    Note Permission
    Read
    Only me
    • Only me
    • Signed-in users
    • Everyone
    Only me Signed-in users Everyone
    Write
    Only me
    • Only me
    • Signed-in users
    • Everyone
    Only me Signed-in users Everyone
    Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.

    Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Explore these features while you wait
    Complete general settings
    Bookmark and like published notes
    Write a few more notes
    Complete general settings
    Write a few more notes
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Mask RCNN ### Introduction Mask R-CNN is a masterpiece of He Kaiming God in 2017. It performs instance segmentation while performing target detection, and has achieved excellent results. It has won the COCO 2016 championship without any tricks. The design of its network is also relatively simple. On the basis of Faster R-CNN, a branch is added to the original two branches (classification + coordinate regression) for semantic segmentation , as shown in the following figure ![](https://i.imgur.com/lKcRBsb.png) Mask R-CNN detailed Introduction So why does this network have such good results, and what are the network details? The following are introduced one by one in detail. Before introducing Mask R-CNN, first understand what is segmentation, because Mask R-CNN does this, so this must be figured out first, see the following figure, which mainly introduces several different segmentation, of which Mask RCNN does Among theminstance segmentation. - **Semantic segmentation**: classify pixel by pixel in an image. - **Instance segmentation**: Detects objects in an image and segmentes the detected objects. - **Panoptic segmentation**: describes all objects in the image. The following picture shows the difference between these two segments. As can be seen in the following figure, panoramic segmentation is the most difficult: ![](https://i.imgur.com/1H3re6I.png) ##### - Instance segmentation must not only find the objects in the image correctly, but also accurately segment them. So Instance Segmentation can be seen as a combination of object dection and semantic segmentation. ##### - Mask RCNN is an extension of Faster RCNN. For each Proposal Box of Faster RCNN, FCN is used for semantic segmentation. The segmentation task and positioning and classification tasks are performed simultaneously. ##### - Introduced RoI Align instead of RoI Pooling in Faster RCNN. Because RoI Pooling is not pixel-to-pixel alignment, this may not have a great impact on the bbox, but it has a great impact on the accuracy of the mask. After using RoI Align, the accuracy of the mask is significantly improved from 10% to 50%, as explained in Section 3. ##### - The semantic segmentation branch is introduced to realize the decoupling of the relationship between mask and class prediction. The mask branch only performs semantic segmentation, and the task of type prediction is assigned to another branch. This is different from the original FCN network. When the original FCN predicts the mask, it also predicts the type to which the mask belongs. ##### - Without using fancy methods, Mask RCNN surpassed all state-of-the-art models of the time. ##### - Trained on an 8-GPU server for two days. #### Mask R-CNN algorithm steps - First, enter an image you want to process, and then perform the corresponding pre-processing operation, or the pre-processed image. - Then, input it into a pre-trained neural network (ResNeXt, etc.) to obtain the corresponding feature map. - Next, a predetermined number of ROIs are set for each point in this feature map to obtain multiple candidate ROIs; - Then, these candidate ROIs are sent to the RPN network for binary classification (foreground or background) and BB regression to filter out some candidate ROIs. - Next, perform a ROIAlign operation on the remaining ROIs (that is, firstly map the original image with the pixels of the feature map, and then map the feature map with the fixed feature). - Finally, these ROIs are classified (N-class classification), BB regression, and MASK generation (FCN operations are performed in each ROI). #### Mask R-CNN architecture decomposition Here, I decompose Mask R-CNN into the following three modules:- 1. Faster-Rcnn 2. ROIAlign 3. FCN. These three modules are core of the algorithm ### FCN The FCN algorithm is a classic semantic segmentation algorithm that can accurately segment objects in a picture. The overall architecture is shown in the figure above. It is an end-to-end network. The main modes include convolution and deconvolution, that is, the image is first convolved and pooled to reduce the size of the feature map. Perform a deconvolution operation, that is, perform an interpolation operation, continuously increase its feature map, and finally classify each pixel value. Thus, accurate segmentation of the input image is achieved. ![](https://i.imgur.com/jUlzglk.png) ##### Analysis and comparison of ROIPooling and ROIAlign ![](https://i.imgur.com/Pr2FdXC.jpg) **The biggest difference between ROI Pooling and ROIAlign is that the former uses two quantization operations, while the latter does not use quantization operations and uses a linear interpolation algorithm.** ![](https://i.imgur.com/0WNo9kL.png) #### How Mask R-CNN achieves good results?? First of all, the difficulty of instance segmentation is that you need to detect the position of the target and segment the target at the same time , so you need to integrate target detection (frame the target's position) and semantic segmentation (classify the pixels and segment the target) )method. Prior to Mask R-CNN, Faster R-CNN performed better in the field of object detection, while FCN performed better in the field of semantic segmentation. So the natural way is to combine Faster R-CNN and FCN. #### So how does Mask R-CNN do it? Mask R-CNN is based on Faster R-CNN. Then we first review Faster R-CNN. Faster R-CNN is a typical two-stage target detection method. First, RPN candidate regions are generated, and then the candidate regions pass through Roi. Pooling performs target detection (including target classification and coordinate regression), and classification and regression share the previous network . #### What improvements have Mask R-CNN made? Mask R-CNN is also two stage, and the RPN part is the same as Faster R-CNN. Then, Mask R-CNN adds a third branch based on Faster R-CNN, and outputs the Mask of each ROI ( Here is the biggest difference from the traditional method. The traditional method generally uses an algorithm to generate a mask and then classify it, and it is performed in parallel here ) Naturally, this becomes a multitasking problem. #### Mask R-CNN Network Mask R-CNN basic structure: It uses the same two-state steps as Faster RCNN: first, it finds the RPN, then classifies, locates, and finds the binary mask for each RoI found by the RPN. This is different from other networks that first found the mask and then classified it. Mask R-CNN's loss function: ![](https://i.imgur.com/q9Olqj3.jpg) Mask Representation: Because there is no fully connected layer and RoIAlign is used, one-to-one correspondence between output and input pixels can be achieved. #### RoIAlign The purpose of RoIPool is to derive a small feature map (eg 7x7) from the ROI determined by the RPN network. The size of the ROI varies, but after RoIPool, it has become 7x7. The RPN network will propose a number of RoI coordinates as [x, y, w, h], and then input RoI Pooling, and output a 7x7 feature map for classification and positioning. The problem is that the output size of RoI Pooling is 7x7. If the RoI size of the RON network output is 8 * 8, then there is no guarantee that the input pixels and output pixels are in one-to-one correspondence. First, they contain different amounts of information (some are 1 1, some are 1 to 2), and secondly, their coordinates cannot correspond to the input (which input pixel coordinates of the RoI output pixel of 1 to 2?). This has little effect on classification, but has a great effect on segmentation. The output coordinates of RoIAlign are obtained using an interpolation algorithm and are no longer quantized; the values in each grid are no longer max, and the difference algorithm is also used. ![](https://i.imgur.com/3yaTulq.png) **Comparison of ROI Pool and ROIAlign performance** ![](https://i.imgur.com/fWtlLvl.jpg) From the previous analysis, we can draw a qualitative conclusion that ROIAlign will greatly improve the performance of target detection. According to the above table, we conducted a quantitative analysis. The results showed that ROIAlign increased the AP value of the mask by 10.5 percentage points, and increased the AP value of the box by 9.5 percentage points. **Comparison of Multinomial and Binary loss** ![](https://i.imgur.com/DXN7D7z.jpg) According to the analysis in the above table, we know that Mask R-CNN uses two branches to decouple classification and mask generation, and then uses Binary Loss instead of Multinomial Loss, which eliminates competition between different types of masks. Depending on the class labels predicted by the classification branch, the corresponding mask is selected for output. The mask branch does not need to be re-classified, and the performance is improved. **Performance comparison between MLP and FCN mask** ![](https://i.imgur.com/9gIhB4Y.jpg) In the table above, MLP uses FC to generate the corresponding mask, while FCN uses Conv to generate the corresponding mask. In terms of parameters, the latter is much less than the former, which will not only save a lot of memory space, Will speed up the entire training process (so fewer parameters need to be inferred and updated). In addition, because the features obtained by MLP are relatively abstract, some useful information is lost in the final mask. We can intuitively see the difference from the right. From a qualitative perspective, FCN increased the mask AP value by 2.1 percentage points. ### Network Architecture: For clarity, there are two classification methods Different backbones are used: resnet-50, resnet-101, resnext-50, resnext-101; Use a different head architecture: When Faster RCNN uses resnet50, the features are derived from CONV4 for RPN use. This is called ResNet-50-C4 In addition to using these structures, the author uses a more efficient backbone--FPN ![](https://i.imgur.com/UniTrFP.jpg) ![](https://i.imgur.com/Crp0wBY.jpg) In the figure above, the red BB in the image indicates the detected target. We can observe with the naked eye that the detection result is not very good, that is, the entire BB is slightly to the right, and some pixels on the left are not included in the BB. The end result shown on the right is perfect. #### Equivariance in Mask R-CNN Equivariance means that the output will change as the input changes. ![](https://i.imgur.com/TrGLGzn.jpg) Equivariance 1 That is, the full convolution feature (Faster R-CNN network) and the transformation of the image have the same deformation, that is, as the image is transformed, the full convolution feature also changes correspondingly; ![](https://i.imgur.com/8QeOV6l.jpg) Equivariance 2 The full convolution operation on the ROI (FCN network) and the transformation in the ROI are homogeneous; ![](https://i.imgur.com/WjrL72G.jpg) Equivariance 3 ROIAlign operation maintains the homogeneity before and after ROI transformation. ![](https://i.imgur.com/Ao0NDRc.jpg) Full Convolution in ROI ![](https://i.imgur.com/x4ZWgVp.jpg) Dimension alignment of ROIAlign ![](https://i.imgur.com/wWR7WiU.jpg) ### Network Training This is basically the same as Faster R-CNN. IOU> 0.5 is a positive sample, and Lmask. It is calculated only for positive samples. The image is transformed to 800 on the short side. The ratio of positive and negative samples is 1: 3. RPN uses 5 scales and 3 aspect ratios. #### Inference Details Mask R-CNN using ResNet as the backbone generates 300 candidate regions for classification and regression, and uses FPN method to generate 1000 candidate regions for classification and regression, and then performs non-maximum suppression operation, **Finally detects the regions before the score of 100. mask detection**. There is no parallel operation like training here, the author explains that it can improve accuracy and efficiencyThen, the mask branch can predict the masks of k categories, but here according to the classification result, select the corresponding k-th category, get the corresponding mask, and then resize to the size of the ROI, and then use the threshold 0.5 to binarize. ( **Here, resize requires interpolation, so it needs to be binarized again. The size of m can refer to the figure above. The mask is not the size of the ROI, but a relatively small picture, so the resize operation is required.** ) ### Experimental results: First is the instance segmentation result of the Mask R-CNN algorithm on the COCO dataset: ![](https://i.imgur.com/eMJNcvB.png) Comparison of the results of the Mask RCNN algorithm and other example segmentation algorithms (MNC and FCIS are the champions of the segmentation competition of COCO 2015 and 2016, respectively). ![](https://i.imgur.com/jvI83jA.png) Table 2 is a comparison of some details ##### (a) is a comparison of the Mask R-CNN effect under different feature extraction networks. ResNet-50-C4 indicates that the extracted features are the output of ResNet's stage4. In other words, the input of ROI Pool or ROIAlign is The output of stage4. It can be seen that deeper networks or better feature extraction networks can bring more improvements. ##### (B) Comparison between sigmoid and softmax. ##### (C ) The comparison of ROI Pool, ROIWarp, and ROIAlign performed on ResNet-50-C4 shows the effectiveness of ROIAlign and the type of pooling that has little effect on the effect of ROIAlign. ##### (D) The comparison between ROI Pool and ROIAlign performed on ResNet-50-C5. It can be seen that the effect of ROI Pool at this time is worse than extracting features from C4. After all, the higher the level of feature quantization, the greater the error. Big. In addition, the effect of ROIAlign based on C5 feature extraction is better than that based on C4 feature extraction, which indicates that the error caused by ROIAlign is very small. This experiment is still more important because it largely solves the long-term large perception field. The problem of poor detection and segmentation comes. ##### ##### (E) shows the comparison of the experimental results of Mask branch using FCN and MLP. ![](https://i.imgur.com/3QIzLhf.png) In addition to the results of instance segmentation, the author also gives the results related to target detection in the article, as shown in Table 3. It can be seen that simply replacing ROIPool in the Faster RCNN algorithm with ROIAlign can also significantly improve. In addition, Mask RCNN has a certain effect on the target detection effect because it has more mask-related supervision information during training. ![](https://i.imgur.com/t3cJrsO.png) ### Summary 1. Mask R-CNN is a very flexible framework. It can add different branches to complete different tasks, and can complete various tasks such as target classification, target detection, semantic segmentation, instance segmentation, and human pose recognition. 2. It is indeed a good thing. algorithm! 3. Goal of Mask R-CNN 4. High speed 5. High accuracy (high classification accuracy, high detection accuracy, high instance segmentation accuracy, etc.) 6. Simple and intuitive 7. Easy to use

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password
    or
    Sign in via Google Sign in via Facebook Sign in via X(Twitter) Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    By signing in, you agree to our terms of service.

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully