Bas Reutteman
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # UNDERSTANDING WEIGHT-MAGNITUDE HYPERPARAMETERS IN TRAINING BINARY NETWORKS: a compact review This review was written by: Bas Rutteman - 5429439 b.rutteman@student.tudelft.nl ## 1. Introduction For this blogpost, a short and compact review was written about the research paper "*understanding weight-magnitude hyperparameters in training binary networks*" by Quist et al [1]. The aim of this review was to offer another point of view towards the research paper, in order to enable additional comprehensability of its theory. Furthermore, this review also aims to clarify this research's relative role so far in the field of neural networks. This was done by looking at research on which the paper is build, while also investigating for any research that already applied or used any of the paper's findings. Additionally a SWOT-analysis was performed. ## 2. Problem formulation of the paper Generally, the values of a neural network's (NN) hyperparameters are tuned carefully, since these parameters influence the magnitude of real-valued weights during and after training. However for a particular class of networks, binary neural networks (BNN's), weight magnitude is absent, which feeds the assumption these hyperparameters do not yield any meaningful contribution. Because of this, a more thorough insight in the mechanism of training BNN's with respect to their hyperparameters can be a valueable feat. The main point of view used in this paper for finding its insight, is supplied by a theoretical research by Helwegen et al. [2]. In this research, latent real valued weight optimization is approximated by using a variable called 'gradient accumulation'. More about this research can be found in section 5. ## 3. Theory behind binary neural networks [1][4] Binary neural networks (BNN's) do not use weights of any magnitude, but instead are tuned towards a specific binary value of either -1 or +1. However, due to their binary nature, the specific problem of zero gradient will inevitably arise during regular training of such a network. Because of this, these networks are generally trained using latent real valued weights. These weights on the other hand do entail the property of magnitude and are supposedly comparable with basic weights regularly seen in any MLP. [1][4] ### 3.1. Theory and mathematics behind training BNN's A network that is based on the technique of BNN's, generally makes use of primarily two specific techniques, which are stochastic gradient descent (SGD) and the momentum optimization. SGD is the vanilla training technique used to update the weights of a NN through backpropagation by using stochastically chosen sets or *batches* each time round. Momentum is a first order filter technique that looks at the exponential moving average (EMA) of the weight updates. It makes sure noisy updates have less effect on forthcoming updates of the weights. The formula's for SGD (1) and momentum (2) can be found below. It should be noted that the SGD-formula incorporates what is called 'weight-decay', by using the weight decay factor. By incorporating this factor, the weights are designed to get increasingly smaller weights for the network, which basically ensures higher generalization capabilities of the network. $w_{i} = w_{i-1}-\varepsilon(m_{i}+\lambda w_{i-1})$ (1) In the formula for SGD (1), $w_{i}$ represents the latent weight of iteration $i$, $w_{i-1}$, represents the latent weight of iteration $i-1$, $\varepsilon$ represents the learning rate, $m_{i}$ represents the momentum variable of iteration $i$ and $\lambda$ represents the weight decay factor. $m_{i} = (1-\gamma)m_{i-1}+\gamma\Delta_{\theta i}$ (2) In the formula for momentum (2), $m_{i}$ represents the momentum value of iteration $i$, $m_{i-1}$, represents the momentum of iteration $i-1$, $\gamma$ represents the adaptivity rate, and $\Delta_{\theta i}$ represents the change of the binary weight for iteration $i$. After calculating the latent weights by using SGD, the weights are updated in the forward pass by the formula below (3). In this formula (3), $\theta_{i}$ is the binary weight that is calculated using the following formula $\theta_{i} = sign(w_{i})$ (3) However, for this research the theory of Helwegen et al. [2] was implemented. This theory states the weight updates can be approximated as gradient updates which yields a similar formula. In the associated formula (4), $g_{i}$ represents the accumulated gradient of iteration $i$ and $g_{i-1}$, represents the accumulated gradient of iteration $i-1$. $g_{i} = g_{i-1}-\varepsilon(m_{i}+\lambda g_{i-1})$ (4) ## 4. Main contribution of the paper [1] Because of the approximation of the latent weights as gradient accumulators, some BNN hyperparameters can be reinterpreted according to the paper. This yields a better understanding of the effect of tuning these hyperparameters (section 4.1.) and thus enables a more goal-oriented optimization. Furthermore, the coined reinterpretation may also lead to simplification of the hyperparameter tuning for these type of BNN systems alltogether (section 4.2.). These two aspects are the main contributions of the paper. ### 4.1. Novel interpretation of BNN hyperparamaters The BNN hyperparameters that have been looked at in this reseach are: weight initialization, learning rate, weight decay, learning rate decay and momentum optimization. The main findings with respect to these hyperparameters can be found in the subsections of this paragraph. #### 4.1.1 Weight initialization The weight initialization can be reinterpreted due to the approximation of the weights as gradients. Since at the first iteration a gradient has not accumulated any momentum yet, it is common sense to set all initial gradients as $g_{0}$ = 0. However, to ensure not all $\theta_{i}$ are set equal to zero, a stochastic sign function is used which randomly assigns each weight a value of either -1 or +1. #### 4.1.2. Learning rate and weight decay After some standard mathematical manipulation of formula's (1) and (2), the paper arrives at the following formula for the gradient updates (5): $g_{i} = (1-\alpha)g_{i-1}+\alpha m_{i}$ (5) In which $\alpha$ is the multiplication of $\epsilon$ and $\lambda$. By looking at this function it becomes clear that the main contribution of the weight decay and the learning rate parameter, is mainly to be a multiplication of the adaptivity rate used for an EMA-filter. #### 4.1.3. Learning rate decay Since, the learning rate is now only incorporated in the update function as a factor that scales the EMA, the learning rate decay can be designated as $\alpha$-decay. This parameter makes sure how fast the window size increases over training. A lower value means the window size will increase less quickly over training and thus the network will converge less rapidly. #### 4.1.4. Momentum Because both the momentum formula and the gradient formula can be perceived as an EMA-filter. The resulting filter that will be created for a BNN is a second order linear infinite impulse response filter. The resulting formula for each iteration of the updated gradient is. $g_{i} = \alpha \gamma \Delta_{\theta i}-(\alpha + \gamma -2) g_{i-1} - (\alpha - 1)(\gamma - 1)g_{i-2}$ (6) #### 4.1.5. Results from evaluations In order to evaluate the effects of reinterpreted hyperparameters, the parameters of accuracy and flipping-ratio were checked under varying conditions of the network (presence of clipping and/or scaling, varying alpha values, varying learning rate, presence of alpha-decay or not, presence of 0-initialization or not). These effects were evaluated by using a BiRealNet-20 architecture on the CIFAR-10 dataset. The main findings of the evaluations were the following: 1. The second order filter of gradient accumulation is preferable to a first order filter, since it filters high-frequent noise while also adapting quicker to recent changes. The first order filter poses a trade-off between these two properties to the designer. 2. A higher $\epsilon$ and lower $g_{0}$ are independent to scaling and have similar flipping ratios. A too small $\epsilon$ or too large $g_{0}$ do however not reach the same flipping ratios. For sufficiently large ratios, scaling both $\epsilon$ and $g_{0}$ has no effect on training, 3. When using magnitude dependent networks with clipping and no scaling, the learning rate must be carefully tuned in order to not push all weights outside the clipping region, yet still have sensible scaling. However, with initialization set to zero, this problem is solved. 4. A too large α causes too many binary weight flips per update, which makes convergence hard. A too small α makes the network converge too quickly, which may lead to sub-optimal performance. 5. Implementing alpha-decay as opposed to not, leads to better convergence. 6. Alpha decay ensures decline of the flip ratio towards the end of the training procedure and thus ensures better convergence. ### 4.2. Less complex hyperparameter tuning Because of the reinterpretation of the hyperparameters, a similar system can be reduced from using 7 hyperparameters to only the three descibed in the previous sections ($\alpha$, $\gamma$ and $\alpha$-decay). This is achieved by dropping the learning-rate, weight-initialization, clipping and scaling parameters. In the paper it is argued, this makes for easier and less computationally expensive hyperparemeter optimization of such BNN systems. Rationale for this is that, since there are less hyperparameters to tune in a coöporative way, finding the optimal combination is a less complex task to achieve. ## 5. Research the paper builds upon The concerning paper cited a fair amount of papers as a reference. However there are a few that are actually recognizable as the main incentives for the research performed. First of all, according to the paper, the problem formulation of the theoretical discrepancy between certain optimization techniques and the nature of BNN's has been noted multiple times in recent years. Furthermore, these researches also proved the importance of tuning these optimization techniques with prudence. The papers noted to support these claims were Liu et al. [3] and Martinez et al. [4]. First of all the paper of Liu et al. [3], tried to find out whether the common trend of using the Adam algorithm for optimization of a BNN classifier was actually the best option. This widely adopted trend was questioned, since a paper by Wilson et al. [5] empirically showed that Adam algorithms find less optimal minima than a regular SGD with momentum for real valued NN's. Therefore, it seemed better to use a standard SGD with momentum instead. The paper reasoned that with SGD, during training, weights align with the gradient of the magnitude. However, in BNN's there is a high chance of these gradients to be zero. This makes it hard to update weights when there is a bad initialization or local minima at hand. However, Adam turned out to rescale these weights based on previous gradients, which made sure the weights could still be updated. The other paper that is being cited is the one from Martinez et al. [4]. In this paper the general contemporary binarization method for binary convolutional neural networks (BCNN) is questioned in terms of its yielding performance. It is argued that this "direct binarization approach" makes for a high quantization error which makes for low accuracy. In here, two optimization techniques are proposed to solve this issue. The first technique, enforces a loss constraint during training, which allows that the output of a binary convolution matches the output of a real convolution in the corresponding layer better. This accordingly makes more sense than previously used standard backpropagation, since backpropagation has proved to be much less effective for BNN's than for real valued networks. The second technique, called data driven channel re-scaling, shows how to boost the representation capability of a binary neural network with only slightly less operations to be performed. This technique proposes to use a full-precision activation signal, before the binarization operation. These papers have thus also showed that there is some general discrepancy in the field of training BNN's and training real valued networks with regard to their optimization parameters. This implies a possibly even larger research gap with respect to other related parameters for other researches to fill. It should be noted, the paper also mentions other papers to have noted the discrepancy between theory and practice with regard to BNN's such as the research from Tang et al. [6] and Hu et al. [7]. Tang et al. focussed on why BNN's had poor performance or failure when trained on larger datasets. Hu et al. was concerned with the fact that contemporary binarization methods applied on 1x1 convolution systems caused substantially greater accuracy degradations. The paper also mentions a myriad of different machine learning papers describing different techniques to train BNN's in its related works section. The main concept of interpreting the latent weights differently for BNN's, is based on the research of Helwegen et al. [2]. In this research, it is argued that latent-weights can not be seen as an equivalent to real valued weights, but rather as so-called accumulating gradients. This boils down to the following mathematical expression (7). $w^{~} = sign(w^{~}) \cdot \left | w^{~} \right | = w_{bin} \cdot m, w_{bin} \in \left \{ -1,1 \right \} , m \in \left [0,\inf \right ]$ (7) This expression (7) states that a latent weight $w^{~}$ is basically the same as the value of its corresponding binary weight $w_{bin}$, multiplied with the magnitude of the latent weight $m$. This means that as the training goes on for longer and the latent weight increases in magnitude, an increasingly large counteracting gradient is necessary to make the binary weight flip. Updating the latent weights will thus have an increasingly stabilizing effect, which allows for decay of the flip rate over training and thus ensures convergence. Besides the theoretical explanation, the paper found a few results from experimental evaluation. First of all, it was found that implementing various learning rates may yield the same effects as varying the initialization scaling. Second of all, learning rate decay over training steps reduces the influence of noise during training, since it increases the accumulated gradient even more. [2] ## 6. Research that builds on this paper After a google scholar search based on relevancy, it turned out that as of the date of writing this blogpost, the research paper has not been cited yet by any recent other studies. It is very probable that this is mainly due to the short time this paper is publicly available. This was since march 2023. ## 7. SWOT-analysis In order to evaluate the contribution of the research paper, a supplementary SWOT-analysis was performed. ### 7.1. Strengths The paper contains many strengths, which is to be expected for a more theoretical analytical paper. First of all, the paper has a very fundamental approach for explanation of the mechanism behind BNN's training mechanism. Concepts of training in general and related to BNN's are concisely explained. This allows for a more thourough understanding of the process as assumptions and approximations are more logical, because the definitions are well demarcated. Additionally, the paper provides of a lot of recent and relevant researches that seem to align well with the formulated problem at hand. It fills the research gap of additional research towards specific hyperparameters, while also having the fundament of previous research that shows its necessity and rationale. Helwegen et al. [2], liu et al. [3], martinez et al. [4]. Additionally, the paper tries to validate the performance of a system that applies the approximation of latent weight optimization as a second order linear infinite impulse response filter. This was done by training a Bi-RealNet-20 architecture on both the CIFAR-10 and Imagenet datasets and evaluating performance. For both datasets it was shown that similar accuracy values were achieved as for the state of the art. This is a very good strength, since it shows the technique is also viable practically and not just theoretically. Lastly, the performed experimentation results provide a very clear picture of the role of each of the 3 hyperparameters. This is done by providing clear graphs, that not only demonstrate the two important parameters (accuracy and flip ratio), but also compare non-magnitude weight networks with networks that have regular magnitude weights. ### 7.2. Weaknesses The paper mentions two weaknesses of the research by itself in its discussion. The last weakness is one considered by the writer of this blogpost. First of these, is that in the research the filtering-based optimizer is only tested on two datasets and one specific architecture. It would have been good for its experimental value to have been applied to more datasets and/or architectures to validate the optimizers performance for a variety of circumstances. [1] Secondly, the paper does not really in its own words: "provide understanding on why optimizing BNNs with second-order low pass filters works as well as it does." It shows the influence of the varying hyperparameters when varied, however it does not go into detail about what actually happens in the feature space during training these network types. [1] Another weakness in the paper was the lack of a graph which shows the effect of multiple $\alpha$-decay variables. In the research only cosine $\alpha$-decay and no $\alpha$-decay was tested. It would have been insightful and the experimental proof would have been more solid if the effect of alpha decay was tested for a higher variety of variables. ### 7.3. Opportunities The paper offers a couple of opportunites. Firstly, the paper provides another step in the understanding of training BNN's with latent weights. It therefore might also be an incentive for additional research towards this branche of machine learning. This is highly desirable, since these types of networks have proved to be very promising due to their low memory consumption when applied in smaller devices. One type of additional research, as suggested by the paper, is trying to understand the science behind why the second order filter works as well as it does.[1] Secondly, since the paper provides a better better understanding of what happens during BNN training, future BNN hyperparameter tuning can be done more goal-oriented and more efficient. The paper reduces the original 7 hyperparameters to more than halve of 3, which makes optimization procedures such as hyperparameter grid searches much less computational burdersome. ### 7.4. Threats It appeared hard to find any threats with regard to the research paper. However one possible threat could be identified. One threat could be that researhcers simply build new algorithms based on this research's results, without checking the scientific reasons for its functionality. This might lead to blunt assumptions in novel networks based on this theory, that make for worse than optimal performance. This can be seen similar to the way latent real weights were deemed analogues to actual real weights and why the associated line of research of our concerning paper is contributional in the first place. ## 8. Discussion This review tried to offer additional comprehensability with regard to the research paper "*understanding weight-magnitude hyperparameters in training binary networks*" by Quist et al. [1]. This was done by also diving into the sources on which the paper was build. By combining theory of both sources and the research itself, it was aimed to provide an additional point of view. Additionally, the role of this research paper in its own branche of machine learning was assessed. This was done by looking at the research papers the paper was build upon and trying to find research that has already build upon the paper. It became apparent the research paper provides a good contribution to the field of machine learning research. Not only does it add valuable results to the promising area of BNN's, it also allows possibilities for future research which can further strengthen the knowledge about these types of networks. It turned out however, that no contemporary research has taken advantage of the results or theories of this study yet. A Google Scholar search based on relevancy yielded zero results. Furthermore, a SWOT-analysis was performed on the research paper. This analysis showed both the strength of the way the paper was written in terms of clarity as the value of the yielded results and the opportunities that come with these results. It should be noted that the research paper does come with a few weaknesses, most of which where already indicated by the researchers themselves. Apart from one small threat of being too eagerly implemented in upcoming research, there should not be too much to worry about with regard to the results of the paper. ## 9. References [1] **“Understanding weight-magnitude hyperparameters in training binary networks”**. Joris Quist, Yunqiang Li and Jan van Gemert. *ICLR, 2023* [2] **Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization** Koen Helwegen, James Widdicombe, Lukas Geiger, Zechun Liu, Kwang-Ting Cheng and Roeland Nusselder. *2019* [3] **How Do Adam and Training Strategies Help BNNs Optimization?**. Zechun Liu, Zhiqiang Shen, Shichao, Koen Helwegen, Dong Huang, Kwang-Ting Cheng. *2021* [4] **TRAINING BINARY NEURAL NETWORKS WITH REAL-TO-BINARY CONVOLUTIONS** Brais Martinez, Jing Yang, Adrian Bulat and Georgios Tzimiropoulos. *ICLR, 2023* [5] **The marginal value of adaptive gradient methods in machine learning** A.C. Wilson, R. Roelofs, M. Stern, N. Srebro, and B. Recht. *Advances in Neural Information Processing Systems, pp. 4148–4158, 2017.* [6] **Elastic-link for binarized neural networks** .Jie Hu, Ziheng Wu, Vince Tan, Zhilin Lu, Mengze Zeng, and Enhua Wu. *Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 942–950, 2022* [7] **How to train a compact binary neural network with high accuracy?** Wei Tang, Gang Hua, and Liang Wang. *AAAI, pp. 2625–2631, 2017*

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully