Robin Dupont
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # ICIP rebutal ## Review Sum-up | | <span style="color:blue">050E</span> | <span style="color:red">0C97</span> | <span style="color:green">02A3</span> | <span style="color:yellow">1358</span> | | ------------------------ | ------------------------------------ | ----------------------------------- | ------------------------------------- | -------------------------------------- | | Importance/Relevance | sufficient interest | sufficient interest | sufficient interest | sufficient interest | | Novelty | Moderately Original | **Very Original** | Moderately original | Moderately original | | Technical Correctness | Probably correct | Probably correct | Probably correct | Probably correct | | Experimental Validation | Limited but convincing | **Lacking in some respect** | Limited but convincing | Limited but convincing | | Clarity of Presentation | Clear Enough | Clear enough | Clear enough | Clear enough | | References to Prior Work | References adequate | References adequate | References adequate | References adequate | ## Per review details ### Review <span style="color:blue">050E</span> _Justification of Novelty_ : Le reviewer a correctement compris le but de la contribution _Justification of Experimental Validation Score_ : Le reviewer parle de **VGG16** alors que l'on a testé sur **VGG19**. _Justification of Clarity of Presentation Score_ : Le reviewer reproche un manque de clarté sur la manière dont les résultats avec MP sont calculés. _Justification of Reference to Prior Work_ : Le reviewer reproche qu'il n'y ait pas de comparaisons avec d'autres méthodes de l'état de l'art et propose une comparaison avec le papier "A Deep Neural Network Pruning Method Based on Gradient L1-norm", IEEE ICCC 2020 ### Review <span style="color:red">0C97</span> _Justification of Novelty/Originality/Contribution Score_ : Le reviewer a compris l'objectif de la contribution. Trouve que la formulation de la reparamétrisation "vaut vraiment le coup". _Justification of Experimental Validation Score_ : Le reviewer aimerait avoir des résultats sur un **dataset plus grand** avec des **images plus grandes** _Additional comments to author(s)_ : Le reviewer suggère de fournir des **cas d'usages** où il est intéressant de ne pas finetuner le réseau. ### Review <span style="color:green">02A3</span> _Justification of Novelty/Originality/Contribution Score_ : Le reviewer à compris le but de la contribution ### Review <span style="color:yellow">1358</span> _Justification of Novelty/Originality/Contribution Score_ : Le reviewer à compris le but de la contribution. Le reviewer pointe du doigt le fait que **les performances de notre méthode sont en retrait par rapport au mag+FT**. Selon lui cela porte préjudice à la justification de notre méthode puisqu'il y a un **tradeoff** ## Points à développer dans le rebutal 1. <span style="color:blue">@050E</span> Préciser la manière dont sont calculés les scores pour le Mag Pruning 2. <span style="color:blue">@050E</span> Se comparer ou expliquer les différences méthodologiques avec le papier dont il parle 3. <span style="color:red">@0C97</span> Proposer quelques chiffres sur des datasets plus grands 4. <span style="color:red">@0C97</span> Fournir des cas d'usages où il est intéressant de pruner mais de ne pas finetuner. 5. <span style="color:yellow">@1358</span> Parler des performances vs mag+FT qui sont moins bonnes ? 6. <span style="color:yellow">@1358</span> répondre au tradeoff ? ## Elements de réponse 1. Les scores du pruning par magnitude fintuné sont obtenus en évaluant un réseau sur le test set de cifar 10 qui a subit 3 étapes : training initial, pruning au coût cible et finetuning. Comme précisé dans le papier, l'étape de finetuning est faite de la même manière que l'étape de tuning, mais le réseau initial est le réseau entrainé et pruné. La performance indiquée dans le tableau est la meilleure accuracy de test sur 5 runs. 2. **Il faut obtenir le papier** 3. Résultats CONV4 sur tinyimagenet : tinyimagenet répond aux critiques : dataset plus gros et images plus grandes. | method \ pruning rate | 90% | 93% | 95% | 97% | 99% | | --------------------- | ------- | ------- | ------- | ------- | ------- | | MP | 26% | 18% | 10% | 3% | 0.5% | | MP + FT | **45%** | **45%** | **45%** | **43%** | 21% | | Ours | 39% | 39% | 39% | 39% | **35%** | Pour un réseau de type conv4 entrainé avec le dataset tinyimagenet dans les mêmes conditions que celles décrites dans la section experiment du papier, on obtient pour un taux de pruning de 99% un gain de 14 points de l'accuracy de test par rapport au matgnitude pruning **avec** finetuning. 4. Deux avantages à notre méthode par rapport au finetuning : Le premier est le fait que le pruning par magnitude est réalisé à posteriori et ne tient pas compte de la topologie du réseau. Les distributions des poids sont différentes à chaque couches, certaines couches ayant des distributions bien plus centrées autour de 0 que d'autres. On peut se retrouver dans une situation où un taux de pruning trop élevé élimine tous les poids d'une seule couche (cela se produit généralement vers les dernières couches du réseaux, celles proches des logits, plus particulièrement si ces couches sont des layers fully connected). Il se produit alors une déconnexion dans le réseau et, d'une part, pendant la phase de training, les poids en amont de cette déconnexion ne sont plus mis à jour par la backpropagation. D'autre part, la déconnexion empêche même la propagation de l'information des premières couches jusqu'aux logits et donc le réseau donne toujours le même résultat. A contrario, notre méthode étant end-to-end, elle permet d'optimiser conjointement les poids et la topologie de manière progressive, ce qui permet d'éviter les déconnexions potentiellement créées par un pruning à posteriori. Le deuxième avantage de notre méthode, comme détaillée dans le papier, est le fait que l'on peut s'abstenir de finetuner le réseau pruné. Ceci est particulièrement utile dans des cas appliqués, notamment dans le cas de déploiement de modèles de reconnaissance faciale déployé dans des caméras intelligentes connectées. L'entrainement est un processus extrêmement long qui prend plusieurs mois, et du fait du caractère embarqué, il est pertinent d'obtenir le modèle le plus léger possible, notamment via le pruning. Le gain de temps peut donc être substentiel. ## Rebutal After carefully reading the reviewers comments, your rebutal provides additional informations on 4 points: The methodology used to obtained finetuned magnitude pruning results, a comparison with [1], insight of our method behaviour on larger datasets and finally some precisions on the main advantages of our method compared to magnitude pruning or methods that requires finetuning. The finetuned magnitude pruning scores are obtained by evaluating the accuracy on the CIFAR 10 test dataset of a network that underwent the following steps: initial training, pruning at the targeted pruning rate (the percentage indicated in columns header of tables 1,2 and 3 are the percentage of removed weights) and then finetuning. As stated in the the section 3.1 of the paper, the finetuning procedure is realised with the same setup as the training procedure (same batch hyperparameters and learning rate schedule) but the initial network is the trained and pruned network. The indicated performance is the best of 5 runs. Review 050E asks what is our contribution compared to [1]. First we kindly remind 050E that we did not test ou method on VGG16 but VGG19. Then, [1] is a structured pruning method, which prune filters based on the L1-norm of their associated gradients. As mentioned in section 1 of out paper, structured puring introduce a strong prior on the network topology to prevent sparse network. In contrast, we chose untructured pruning to ensure our method is not constraint in the obtention of a new topology. Moreover we target higher pruning rates (90% and more) that Liu et al. which repports pruning rates up to 71%. Very high pruning rates (90%+) ensure that we take full advantage of sparse computation library. Review 0C97 pointed out that tests on larger datasets with larger images would be more informative. Therefore we chosed the Tiny Imagenet dataset [2], which consists in 100k 64x64 color images split in 200 classes. At pruning rates of 90%, 95%, 97% our method performs much better than the magnitude pruning method, with gain on the test accuracy from 13 point to 36 points, while staying only 6 points lower than the finetuned magnitude pruning method. At the very high puning rate of 99%, our method overperforms both magnitude pruning and finetuned magnitude pruning by providing a gain of 34.5% and 14% respectively, at a test accuracy of 35%. Our methods exhibit two main advantages. The first advantage of our method is related to disconnection in the network. The magnitude pruning method pruned the weight a prosteriori, after training, while only considering the absolute values of the weights as a saliency criterion. However, the weight distributions are different from one layer to another. Some layers have weight distribution that are way more centered arround zero than other layers. Hence, a too important pruning rate might prune all the weights of a layer (which generally happens in tha last layers of a network, particularly if these layers are fully connected ones). As a consequence, we observe a disonnection in the network, leading to 1) impossibility to finetune upper layer (the backpropagation information is lost after the disonnection) and 2) random outputs from the networks (the inputs does not change the outputs). In contrast, our method being end-to-end, it jointly optimize both the weights and the topology in a progressive manner, which prevents the network from potential disconnection introduced by a aposteriori pruning. The second advantage of our method is the fact that is does not need for a finetuning. It is particularly useful in applied cases such as embedded neural networks. One the one hand, the pruning yield a lightweight model that can be shipped on low power devices and on the otherhand it prevents a finetuning that can last for month in some cases, allowing for faster developpement cycles, and model shipping to the final customer. [1] Liu et al., "A Deep Neural Network Pruning Method Based on Gradient L1-norm", IEEE ICCC 2020 [2] https://tiny-imagenet.herokuapp.com/ -> 653 mots ## Shrinked version 050E : The fine-tuned magnitude pruning (FMP) scores are obtained by evaluating the accuracy on the CIFAR 10 test dataset of a network that underwent full training, pruning to the targeted rate and then fine-tuning (with the same hyperparameters as training). The indicated performance is the best of 5 runs. 050E asks what is our contribution compared to [1]. First we kindly remind 050E that we did not test ou method on VGG16 but VGG19. Then, [1] is a structured pruning method, which prune filters. As mentioned in section 1 of our paper, structured pruning introduce a strong prior on the network topology to prevent sparse network. In contrast, we chose unstructured pruning to ensure our method is not constraint in the topology search. Moreover, we target higher pruning rates (90%+) that Liu et al, who repport pruning rates up to 71%. Very high pruning rates (90%+) ensure that we take full advantage of sparse computation library on embedded devices. 0C97 asked for tests on larger datasets with larger images. We chose the Tiny Imagenet dataset [2], which consists in 100k 64x64 color images split in 200 classes. At pruning rates of 90%, 95%, 97% our method performs much better than the magnitude pruning (MP) method, with gain on the test accuracy from 13 to 36 points, and only 6 points lower than FMP method. At 99%, our method over performs both MP and FMP with a gain of 34.5 and 14 points respectively, at a test accuracy of 35%. 0C97 & 1358 : The first advantage of our method is related to disconnection in the network. The MP method pruned the weight after training, only considering the magnitude of the weights as a saliency criterion. However, the weight distributions are different from one layer to another. Some layers have weight distribution that are way more centered around zero than others. Hence, a too important pruning rate might prune all the weights of a layer (which generally happens in the last layers of a network, particularly if these layers are fully connected ones). As a consequence, we observe a disconnection in the network, leading to 1) impossibility to fine-tune upper layer and 2) random outputs from the networks. In contrast, our method being end-to-end, it jointly optimize both the weights and the topology in a progressive manner, which prevents potential disconnection introduced by an a posteriori pruning. The second advantage of our method is the fact that is does not need fine-tuning. It is particularly useful in applied cases such as embedded neural networks. One the one hand, the pruning yield a lightweight model that can be shipped on low powered devices. On the other hand it prevents a fine-tuning that can last for months in some cases, allowing for faster development cycles and model shipping to the final customer. [1] Liu et al., "A Deep Neural Network Pruning Method Based on Gradient L1-norm", IEEE ICCC 2020 [2] https://tiny-imagenet.herokuapp.com/ -> 486 mots ## Revised version 050E: The fine-tuned magnitude pruning (FMP) scores are obtained by evaluating the accuracy on the CIFAR10 test dataset of a network that underwent full training, pruning to the targeted rate and then fine-tuning (with the same hyperparameters as training). The indicated performance is the best of 5 runs. 050E asks what is our contribution compared [1](2020) (which btw is extremely similar to [3], same saliency criterion applied on gradient, not weights). First we kindly remind 050E that we did not test our method on VGG16 but VGG19. Then, [1] is a structured pruning method, which prunes filters. As mentioned in section 1 of our paper, structured pruning introduces a strong prior on the network topology to prevent sparse network. In contrast, we chose unstructured pruning to ensure our method is not constrained in the topology search. Moreover, we target higher pruning rates (90%+) that Liu et al., who report pruning rates (PR) up to 71%. Very high PR (90%+) ensures that we take full advantage of sparse computation library on embedded devices. 0C97 asked for tests on larger datasets with larger images. We chose the TinyImagenet dataset[2], which consists in 100k 64x64 color images split in 200 classes. At PR of 90%, 95%, 97% our method performs much better than the magnitude pruning (MP) method, with gain on the test accuracy from 13 to 36 points, and only 6 points lower than FMP method. At 99%, our method over performs both MP and FMP with a gain of 34.5 and 14 points respectively, at a test accuracy of 35%. We would be able to include these results in an updated paper version. 0C97 & 1358: The first advantage of our method is related to disconnection in the network. The MP method pruned the weight after training, only considering the magnitude of the weights as a saliency criterion. However, the weight distributions are different from one layer to another. Some layers have weight distribution that are way more centered around zero than others. Hence, a too important PR might prune all the weights of a layer (which generally happens in the last layers of a network, particularly if these layers are fully connected ones). As a consequence, we observe a disconnection in the network, leading to 1) impossibility to fine-tune upper layer and 2) random outputs from the networks. In contrast, our method being end-to-end, it jointly optimizes both the weights and the topology in a progressive manner, which prevents potential disconnection introduced by an a posteriori pruning. The second advantage of our method is the fact that is does not need fine-tuning. It is particularly useful in applied cases such as embedded neural networks (including face recognition). On the one hand, the pruning yields a lightweight model that can be shipped on low-powered devices. On the other hand it prevents a fine-tuning that can last for months in some cases, allowing for faster development cycles and model shipping to the final customer. [1]-Reference of 050E. [2]-https://github.com/ksachdeva/tiny-imagenet-tfds [3]-[16] in our paper ## Revised Version 2 Review 050E: The fine-tuned magnitude pruning scores correspond to the accuracy (on CIFAR) of lightweight networks obtained by (i) training primary networks (VGG19, etc.), (ii) pruning their weights to the targeted rate, and (iii) fine-tuning these weights using the same hyper-parameters. We kindly remind that this training process is achieved on top of VGG19 and not VGG16. Regarding the suggested reference [Liu-et-al-2020] (closely related to a group of methods, such as [3], already cited in our paper): this reference uses the same saliency criterion as [3] but applied to gradients instead of weights. Moreover, [Liu-et-al-2020] is based on structured pruning: as mentioned in section 1 of our paper, structured pruning relies on strong priors on the topology of the trained networks, which make balancing high pruning rates and accuracy very hard to achieve. In contrast, our proposed unstructured pruning avoids these rigid priors by training the weights of the networks while also adapting their topology. In practice, our method achieves very high pruning rates (90%+) compared to those provided in [Liu-et-al-2020] which do not exceed 71%, and this allows taking full advantage of sparse computation libraries on embedded devices. Review 0C97: suggested extra experiments on larger datasets and images. We recently applied our method on Tiny Imagenet (https://github.com/ksachdeva/tiny-imagenet-tfds), which consists in 100,000 images of 64x64 pixels belonging to 200 classes. At high pruning rates, for instance 99%, our method overtakes not only magnitude pruning (MP), but also fine-tuned MP by a significant margin (34.5 and 14 points respectively). These extra recent results can be added, in our tables, for different pruning rates. Reviews 0C97 & 1358: in contrast to MP, one of the major advantages of our method is its ability to prevent disconnections. Indeed, as MP decouples weight training from pruning, one may end up with pruned networks which are topologically inconsistent. In other words, networks with heterogeneous weight distributions may result into completely disconnected layers (especially fully connected ones) when their incoming or outgoing connections are completely pruned. This behavior, observed with high pruning rates, makes fine-tuning powerless to restore a high accuracy, and results into random classification performances. In contrast, our method jointly optimizes network weights and topology, and thereby prevents disconnections. As a second advantage, our method bypasses the fine-tuning step which may last for months in some applications. This allows faster development cycles and yields lightweight embedded networks that can easily be shipped and deployed on low powered devices.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully